Forum Moderators: open
403 Forbidden - Spider Trap
You don't have permission to access /spider_trap.html on this server. Furthermore, you have fallen into our spider trap and all access from your IP address is now blocked.If you believe this is in error, please email the webmaster at XXXXX.com describing how you might have fallen into a trap designed to capture web spiders and robots.
After visiting the above page I can still access the site apart from the robots.txt file. When I do this message is generated:
Your attempt to access /robots.txt on this server has been forbidden by our spider trap. This means that at some time your IP address xxx.xx.xx.xxx has broken the rules of the Robot Exclusion Protocol, or accessed the spider trap page.
Accessing the robots.txt file another way shows it to be as follows:
User-agent: *
Disallow: /spider_trap.html
From this I've concluded that the spider_trap page is to nail spiders that ignore the robots.txt instruction to avoid the spider_trap.html page - i.e. bad bots that don't obey the robots.txt protocol get excluded.
Cloaking does not appear to be being used when I use Brett's spider sim to view the page. Not sure if I'm correct in this, but I understand spider traps are a key part of cloaking and I'm concerned that if there is a page called spider_trap.html this may be a red flag to SE's & have them think cloaking is being used.
Any guidance on this would be appreciated. Thanks
>>hidden link
Might be enough to cause you trouble with many major search engines.
>>After visiting the above page I can still access the site apart from the robots.txt file
So bad bots can't access robots.txt. I didn't think they bothered anyway?
Yes, it sounds like the trap is - to use one of my favorite made-up words - "mis-implemented".
The spider trap should block access to all pages on the site except for robots.txt.
I wouldn't worry about one hidden link on a page that leads to a spider trap file which has been disallowed in robots.txt, especially when the file is actually named "spider_trap". You're not going to get banned for this without a human review.
Cloaking is defined (by Google) as an attempt to mislead. This is not an attempt to mislead, it is an enforcement of robots.txt. Search engines will not follow that link because it is disallowed. Therefore, it obviously cannot be interpreted as an attempt to get them to index something other than what a human would see. I wouldn't (and don't) worry about it... Had two kills in my trap just yesterday... :)
A search here on WebmasterWorld for "bad bot script" will turn up several spider trap threads, and may give you a good idea how to fix your client's site, including fixing the trap.
Jim