homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

Spiders that ignore or skip robots.txt
Can bad spiders be identified?

 9:21 pm on Feb 2, 2004 (gmt 0)

I realize this question boarders on the Spider ID forum that was closed. Although my question is a general one.
Is there a way in the robots.txt file to ID spiders that ignore or even skip it?
Is one thing to be able to id them when they go to robots.txt first. But what about the ones that skip it?

Any suggestions or direction on where to look for info would be helpful.




 9:26 pm on Feb 2, 2004 (gmt 0)

Most people setup a spider trap to catch bad bots:



 9:38 pm on Feb 2, 2004 (gmt 0)

Hi David,

one basic idea is to set up a new directory /bottrap/,
set a hidden link (probably using a 1x1 transparent gif or some other link invisible for the casual user) on your main page,
write the following into your robots.txt
User-agent: *
Disallow: /bottrap/
and wait watching who is accessing the /bottrap/, either by looking thru your log manually, or by setting up a script /bottrap/index.php sending you an automatic alert.



 2:05 pm on Feb 3, 2004 (gmt 0)

Thank you both for the information.


Global Options:
 top home search open messages active posts  

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved