homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Advertising / Paid Inclusion Engines and Topics
Forum Library, Charter, Moderator: open

Paid Inclusion Engines and Topics Forum

  posting off  
Alavista's spider Scooter = SPAM!?
Spider accessing site generates 100s of 404 reports daily!..

Msg#: 318 posted 12:53 am on Jan 19, 2001 (gmt 0)

I have a website that has in place a CGI script that catches and informs me about 404 errors. I am received at this point HUNDREDS of messages a day, ALL triggered by Alavista's SCOOTER.
I don't know what business Altavista has indexing my site, because I NEVER submited it to ALtavista, nor do I want to have it indexed by Altavista or any other search engine; I don't know how to solve this problem, or if it is solvable on Altavist'a side, but I am certainly not thrilled with this experience, that amounts to something worse, FAR WORSE!, than spamming!...
How can I stop this!
Any suggestions? Help would be much appreciated.
Thank you.



10+ Year Member

Msg#: 318 posted 1:12 am on Jan 19, 2001 (gmt 0)

Welcome to WmW

Are you referring to the Apache Guardian script and lots of robots.txt 404's?


10+ Year Member

Msg#: 318 posted 1:49 am on Jan 19, 2001 (gmt 0)

Hi Joao,

Check out:

The page explains how to prevent their robot from visiting your site by using robots.txt and/or robots meta tags. Hope this helps.


Msg#: 318 posted 2:16 am on Jan 19, 2001 (gmt 0)

THANKS a million! Much appreciated. I have a txt file there, but probably not using it right. Will check, also, the metatags solution. One extra question: Are there spiders that unethically ignore these restrictions and go crawling and fllowing links anyway?!...

And there is another problem:

The original page is not there anymore, so Scooter looks for it, maybe to update links/indexes (?) and since it is not found, my cgi script returns a 404-someone-was-looking-for-a-page-that-doesn't-exist-any-longer.

Should the txt exclusion file prevent Scooter (and other spiders/robots) from even start searching for the page that was once there?!

Again, thanks!



WebmasterWorld Senior Member mivox us a WebmasterWorld Top Contributor of All Time 10+ Year Member

Msg#: 318 posted 2:39 am on Jan 19, 2001 (gmt 0)

If you don't want any robots indexing ANYTHING on your site at all, just use the robots.txt file to disallow all robots from your root directory completely.

save this file as plain text named "robots.txt" with this content:

User-agent: *
Disallow: /

all robots that follow the robots.txt protocol should leave your site entirely alone.

But yes, there are robots that ignore proper manners and never even look at the robots.txt file. In those cases, looking up where the robot is sent from (can take a bit of detective work, tracking IP numbers & whatnot) and mailing their administrators (or their upstream providers' adminitrators) to complain about the behavior has worked for me in the past.

Global Options:
 top home search open messages active posts  

Home / Forums Index / Advertising / Paid Inclusion Engines and Topics
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved