Welcome to WebmasterWorld Guest from 54.162.44.105

Forum Moderators: goodroi

Message Too Old, No Replies

Incorrect Robots.txt URL in server Logs

     
4:59 pm on Sep 17, 2012 (gmt 0)

Junior Member

5+ Year Member

joined:Apr 8, 2008
posts:107
votes: 0


Hey Guys,

I'm seeing some seriously strange stuff in our log files.

After receiving warning in GMT about robots.txt inaccessible, we checked the server logs and are seeing the following:

66.249.73.200 www.example.com - [16/Sep/2012:12:21:54 -0400] "GET /exampleproducts/product-2012.htmlrobots.txt HTTP/1.1" 301 - "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" "-"

Any idea why Google would request incorrect URLs like this? Anyone seeing anything similar?

Thanks,

-t
5:06 pm on Sept 17, 2012 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


That request is being redirected.

You should check where to. That could be an even bigger problem.
5:21 pm on Sept 17, 2012 (gmt 0)

Junior Member

5+ Year Member

joined:Apr 8, 2008
posts:107
votes: 0


It gets redirected to the products page. Hence why we get errors, but why would google request a bogus URL like this?
6:42 pm on Sept 17, 2012 (gmt 0)

Junior Member

5+ Year Member

joined:Apr 8, 2008
posts:107
votes: 0


Another clue: All the URLs seem to have vanity tld URLs redirecting to them. Is G trying to access the robots.txt of these URLs and instead requesting it from the deep page? Seems like a rather dumb idea for such a smart algorithm.

examplevanityurl.com -> 301 -> example.com/deepURL.html
examplevanityurl.com/robots.txt -> example.com/deepURL.html/robots.txt

Seems silly, no?
7:57 pm on Sept 17, 2012 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:10847
votes: 61


if examplevanityurl.com is yours i would look for why that server is doing essentially a sitewide redirect to a subdirectory of example.com and fix it so it redirects to the root specifically for robots.txt request.