Anyone know if MSN bot has figured out a way to know if robots.txt is being rendered using a php script?
Today MSN bot directly hit my php version of robots.txt (robots.#*$!.php) with a get and then immediately hit robots.txt with a get. No other bots/requests for robots.txt have ever tried/or succeeded in accessing the php version. While it wouldn’t be impossible to guess the #*$! it isn’t all that obvious and I have to believe the MSN bot knew what it was looking for.
Msg#: 3694923 posted 10:11 am on Jul 12, 2008 (gmt 0)
I would disallow the .php URL in robots.txt, OR, better yet, I would set up an internal rewrite (that's a rewrite, and NOT a redirect) to /this-does-not-exist so that the .phpURL returns a 404. That 404 would not affect the ability of the script to operate and do it's thing.
I asked for a review of my .htaccess rewrite logic over in the apache forum. The problem was the order of my rewrites. I did the internal rewrite for robots.txt to robots.$!@#.php first which worked except,, later I did an external (304) rewrite appending www which exposed my internal rewrite. Most (all other) bots that I've logged went after robots.txt with a www. so I never saw the exposing of my internal rewrite. Thanks to Jim all fixed now.
So, msnbot must try to get robots.txt using a url with both the www. and without the www. Maybe they've learned that the technique can yield results.
Msg#: 3694923 posted 8:26 am on Jul 14, 2008 (gmt 0)
Yes, there might be a completely different website at domain.com compared to www.domain.com just as there might be different sites at forums.domain.com and store.domain.com - it's just another subdomain after all