Welcome to WebmasterWorld Guest from

Forum Moderators: goodroi

Message Too Old, No Replies

Spiders that ignore or skip robots.txt

Can bad spiders be identified?



9:21 pm on Feb 2, 2004 (gmt 0)

10+ Year Member

I realize this question boarders on the Spider ID forum that was closed. Although my question is a general one.
Is there a way in the robots.txt file to ID spiders that ignore or even skip it?
Is one thing to be able to id them when they go to robots.txt first. But what about the ones that skip it?

Any suggestions or direction on where to look for info would be helpful.



9:26 pm on Feb 2, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member

Most people setup a spider trap to catch bad bots:



9:38 pm on Feb 2, 2004 (gmt 0)

10+ Year Member

Hi David,

one basic idea is to set up a new directory /bottrap/,
set a hidden link (probably using a 1x1 transparent gif or some other link invisible for the casual user) on your main page,
write the following into your robots.txt
User-agent: *
Disallow: /bottrap/
and wait watching who is accessing the /bottrap/, either by looking thru your log manually, or by setting up a script /bottrap/index.php sending you an automatic alert.



2:05 pm on Feb 3, 2004 (gmt 0)

10+ Year Member

Thank you both for the information.



Featured Threads

Hot Threads This Week

Hot Threads This Month