Welcome to WebmasterWorld Guest from 54.158.36.59

Forum Moderators: Ocean10000 & incrediBILL

Message Too Old, No Replies

MSN bot finds php robots.txt

     
10:54 pm on Jul 9, 2008 (gmt 0)

5+ Year Member



Anyone know if MSN bot has figured out a way to know if robots.txt is being rendered using a php script?

Today MSN bot directly hit my php version of robots.txt (robots.#*$!.php) with a get and then immediately hit robots.txt with a get. No other bots/requests for robots.txt have ever tried/or succeeded in accessing the php version. While it wouldn’t be impossible to guess the #*$! it isn’t all that obvious and I have to believe the MSN bot knew what it was looking for.

Phred

10:11 am on Jul 12, 2008 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



I would disallow the .php URL in robots.txt, OR, better yet, I would set up an internal rewrite (that's a rewrite, and NOT a redirect) to /this-does-not-exist so that the .php URL returns a 404. That 404 would not affect the ability of the script to operate and do it's thing.
11:33 pm on Jul 13, 2008 (gmt 0)

5+ Year Member



Hi g1smd,

I asked for a review of my .htaccess rewrite logic over in the apache forum. The problem was the order of my rewrites. I did the internal rewrite for robots.txt to robots.$!@#.php first which worked except,, later I did an external (304) rewrite appending www which exposed my internal rewrite. Most (all other) bots that I've logged went after robots.txt with a www. so I never saw the exposing of my internal rewrite. Thanks to Jim all fixed now.

So, msnbot must try to get robots.txt using a url with both the www. and without the www. Maybe they've learned that the technique can yield results.

Phred

8:26 am on Jul 14, 2008 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Yes, there might be a completely different website at domain.com compared to www.domain.com just as there might be different sites at forums.domain.com and store.domain.com - it's just another subdomain after all
 

Featured Threads

Hot Threads This Week

Hot Threads This Month