homepage Welcome to WebmasterWorld Guest from 54.196.206.80
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Accredited PayPal World Seller

Visit PubCon.com
Home / Forums Index / Advertising / Paid Inclusion Engines and Topics
Forum Library, Charter, Moderators: Brett Tabke

Paid Inclusion Engines and Topics Forum

  posting off  
Alavista's spider Scooter = SPAM!?
Spider accessing site generates 100s of 404 reports daily!..
Joao




msg:16235
 12:53 am on Jan 19, 2001 (gmt 0)

I have a website that has in place a CGI script that catches and informs me about 404 errors. I am received at this point HUNDREDS of messages a day, ALL triggered by Alavista's SCOOTER.
I don't know what business Altavista has indexing my site, because I NEVER submited it to ALtavista, nor do I want to have it indexed by Altavista or any other search engine; I don't know how to solve this problem, or if it is solvable on Altavist'a side, but I am certainly not thrilled with this experience, that amounts to something worse, FAR WORSE!, than spamming!...
How can I stop this!
Any suggestions? Help would be much appreciated.
Thank you.
Joao

 

BoneHeadicus




msg:16236
 1:12 am on Jan 19, 2001 (gmt 0)

Welcome to WmW

Are you referring to the Apache Guardian script and lots of robots.txt 404's?

tedres




msg:16237
 1:49 am on Jan 19, 2001 (gmt 0)

Hi Joao,

Check out:
[doc.altavista.com...]

The page explains how to prevent their robot from visiting your site by using robots.txt and/or robots meta tags. Hope this helps.

Joao




msg:16238
 2:16 am on Jan 19, 2001 (gmt 0)

THANKS a million! Much appreciated. I have a txt file there, but probably not using it right. Will check, also, the metatags solution. One extra question: Are there spiders that unethically ignore these restrictions and go crawling and fllowing links anyway?!...

And there is another problem:

The original page is not there anymore, so Scooter looks for it, maybe to update links/indexes (?) and since it is not found, my cgi script returns a 404-someone-was-looking-for-a-page-that-doesn't-exist-any-longer.

Should the txt exclusion file prevent Scooter (and other spiders/robots) from even start searching for the page that was once there?!

Again, thanks!

Joao

mivox




msg:16239
 2:39 am on Jan 19, 2001 (gmt 0)

If you don't want any robots indexing ANYTHING on your site at all, just use the robots.txt file to disallow all robots from your root directory completely.

save this file as plain text named "robots.txt" with this content:

User-agent: *
Disallow: /

all robots that follow the robots.txt protocol should leave your site entirely alone.

But yes, there are robots that ignore proper manners and never even look at the robots.txt file. In those cases, looking up where the robot is sent from (can take a bit of detective work, tracking IP numbers & whatnot) and mailing their administrators (or their upstream providers' adminitrators) to complain about the behavior has worked for me in the past.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Advertising / Paid Inclusion Engines and Topics
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About
© Webmaster World 1996-2014 all rights reserved