Referencing back to a great post on a PHP spider trap [webmasterworld.com] here about 18 months ago.
Boy I have had fun with this one on five sites. The trap sends me an email when a bad bot gobbles my /trap.php. It's been working with out problems for over a year, and I'm not sure it has a problem even now. I think it's with MSN.
Here's the deal. I was going over the pages indexed by the big three for one of my sites and there was the surprise. MSN Search had neatly indexed mydomain.com/trap.php A great way for anyone who happened to stumble on it to block themselves.
I checked the log file and MSN's bot never touched that trap.php or banned itself. I checked my sitemap.xml to see if I had stuck it in there by accident. Is it possible that it grabbed that from my server log as it would have been put there when testing?
The site is only a couple of weeks old and just starting to get spidered. For some reason Google is letting the others get ahead on this one. If anyone has an idea I would sure enjoy hearing it.
[edited by: jatar_k at 5:54 am (utc) on Feb. 2, 2006] [edit reason] fixed link [/edit]
If what you see is the URL of your trap page in the index, it is a normal behavior for msn. It happens to me too. What happens is that the bot will pick up the link from any page which links to your trap but following the rules in your robots.txt it won't hit the page. However, it will be listed in the index as URL only.
In fact, for my site, when I do a site:example.com command using msn, it comes in the top 5 lol!.