Forum Moderators: open

Message Too Old, No Replies

How do I stop a spider from coming back?

This is NOT your average robots.txt problem!

         

TWhalen

4:34 pm on Jan 24, 2002 (gmt 0)

10+ Year Member



I have a spider problem - Scooter is hitting my server trying to spider my website thousands of times a day. But it's looking for URL's that don't/never existed! It seems to be "making up" URL's and querying my site to find them.
Is there any kind of robots.txt file I can create that will tell (just) Scooter "Hey big fella, bug off!"

(I don't mind if Scooter was coming for a legit reason, but my logfiles are filled with bogus info right now)
Help!

gethan

4:37 pm on Jan 24, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hey TWhalen, welcome to WebmasterWorld,

Check out

[webmasterworld.com...]

if you have access to mod_rewrite.

Good luck

TWhalen

5:06 pm on Jan 24, 2002 (gmt 0)

10+ Year Member



Thanks for the advice, Gethan.
I have another question though - if I use this robots.txt file, will Scooter stop coming altogether, or will it just tell him to go away "for now"?

I still want Scooter to come back at a later date and spider my actual site, I'm just trying to stop him from looking for those non-existant url's over and over. It's been doing this for almost 2 weeks now, and even though Scooter is getting a non-resolving destination (I'm not even using a 404 page), he still keeps coming back to look for the urls.

TWhalen

5:13 pm on Jan 24, 2002 (gmt 0)

10+ Year Member



Er, I meant "if I use this htaccess file", not robots.txt :)

gethan

5:22 pm on Jan 24, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If you added scooter to the .htaccess file as in the post above scooter would be blocked for good.

I would suggest this as an alternative:

[perl]
RewriteBase /
RewriteCond %{HTTP_USER_AGENT} ^Scooter
RewriteRule ^(.*) [altavista.com...] [L,R=302]
[/perl]
The R=302 says that your page has moved temporarily to the same page under altavista - maybe the techs there will get the hint ;)

You'll still get all of the logs though.

Ideally I would block scooter in abuse mode at the firewall (or ipchains on linux) - but this will not be an option on a hosting package.

msgraph

5:35 pm on Jan 24, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I don't think Scooter likes to obey Robots.txt. I've talked to their techs numerous times and they refused to believe it. They always replied by saying they would "look into it."

When they get too busy we just ban em at the router.

TWhalen

8:22 pm on Jan 24, 2002 (gmt 0)

10+ Year Member



Seem to have found another solution...
I emailed Crawler Support at AV, and they looked into it today.
Magically my spider troubles have now stopped.
Thanks all for the helpful advice!