Spider Trap page

I'm trying to understand why a client's website has a hidden link to a page called spider_trap.html. Opening this link generates the following message:

403 Forbidden - Spider Trap
You don't have permission to access /spider_trap.html on this server. Furthermore, you have fallen into our spider trap and all access from your IP address is now blocked.
If you believe this is in error, please email the webmaster at XXXXX.com describing how you might have fallen into a trap designed to capture web spiders and robots.

After visiting the above page I can still access the site apart from the robots.txt file. When I do this message is generated:

Your attempt to access /robots.txt on this server has been forbidden by our spider trap. This means that at some time your IP address xxx.xx.xx.xxx has broken the rules of the Robot Exclusion Protocol, or accessed the spider trap page.

Accessing the robots.txt file another way shows it to be as follows:

User-agent: *
Disallow: /spider_trap.html

From this I've concluded that the spider_trap page is to nail spiders that ignore the robots.txt instruction to avoid the spider_trap.html page - i.e. bad bots that don't obey the robots.txt protocol get excluded.

Cloaking does not appear to be being used when I use Brett's spider sim to view the page. Not sure if I'm correct in this, but I understand spider traps are a key part of cloaking and I'm concerned that if there is a page called spider_trap.html this may be a red flag to SE's & have them think cloaking is being used.

Any guidance on this would be appreciated. Thanks

Spider Trap page

Is this a risk?

biggles

Receptional Andy

jdMorgan

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week