-- Search Engine Spider and User Agent Identification
---- above the law? ignores ROBOTS.TXT
lucy24 - 9:12 pm on May 13, 2013 (gmt 0)
How did they find your test site?
Wouldn't it be harder to prevent someone from finding it? My impression was that once a domain name is registered, the robots will come.
On the test site, nothing links from the front page except the honeypot, which exists purely to identify robots who are both stupid and bad. Log wrangling filters out anyone who got a 403, so anything left over will jump up and hit me in the face. It's about equal amounts stupid-plus-bad robots, and humans whose Bing searches led them to the site name even though it's roboted-out so no snippet. Yes, I could take the "noindex" approach instead, but this way is more fun. The front page says, quote,
Bad news for any passing humans: This is a test site. You won’t find any entertainment. Sorry. I had to call it something, and the domain name was just sitting there.
Oh, and the site shares an htaccess (mod_authz and mod_setenvif) with my "real" sites, so offending IPs are blocked at the gate.