| Spiders that ignore or skip robots.txt Can bad spiders be identified? |
DavidAtWork

msg:1528696 | 9:21 pm on Feb 2, 2004 (gmt 0) | I realize this question boarders on the Spider ID forum that was closed. Although my question is a general one. Is there a way in the robots.txt file to ID spiders that ignore or even skip it? Is one thing to be able to id them when they go to robots.txt first. But what about the ones that skip it? Any suggestions or direction on where to look for info would be helpful. thanks....
|
bcolflesh

msg:1528697 | 9:26 pm on Feb 2, 2004 (gmt 0) | Most people setup a spider trap to catch bad bots: [webmasterworld.com...]
|
Romeo

msg:1528698 | 9:38 pm on Feb 2, 2004 (gmt 0) | Hi David, one basic idea is to set up a new directory /bottrap/, set a hidden link (probably using a 1x1 transparent gif or some other link invisible for the casual user) on your main page, write the following into your robots.txt User-agent: * Disallow: /bottrap/ and wait watching who is accessing the /bottrap/, either by looking thru your log manually, or by setting up a script /bottrap/index.php sending you an automatic alert. Regards, R.
|
DavidAtWork

msg:1528699 | 2:05 pm on Feb 3, 2004 (gmt 0) | Thank you both for the information. regards....
|
|
|