Forum Moderators: open
Check out:
[doc.altavista.com...]
The page explains how to prevent their robot from visiting your site by using robots.txt and/or robots meta tags. Hope this helps.
And there is another problem:
The original page is not there anymore, so Scooter looks for it, maybe to update links/indexes (?) and since it is not found, my cgi script returns a 404-someone-was-looking-for-a-page-that-doesn't-exist-any-longer.
Should the txt exclusion file prevent Scooter (and other spiders/robots) from even start searching for the page that was once there?!
Again, thanks!
Joao
save this file as plain text named "robots.txt" with this content:
User-agent: *
Disallow: /
all robots that follow the robots.txt protocol should leave your site entirely alone.
But yes, there are robots that ignore proper manners and never even look at the robots.txt file. In those cases, looking up where the robot is sent from (can take a bit of detective work, tracking IP numbers & whatnot) and mailing their administrators (or their upstream providers' adminitrators) to complain about the behavior has worked for me in the past.