Welcome to WebmasterWorld Guest from 18.104.22.168 , register , free tools , login , search , subscribe , help , library , announcements , recent posts , open posts Subscribe to WebmasterWorld
Incorrect Robots.txt URL in server Logs triggerfinger msg:4496322 4:59 pm on Sep 17, 2012 (gmt 0) Hey Guys, I'm seeing some seriously strange stuff in our log files. After receiving warning in GMT about robots.txt inaccessible, we checked the server logs and are seeing the following: 22.214.171.124 www.example.com - [16/Sep/2012:12:21:54 -0400] "GET /exampleproducts/product-2012.htmlrobots.txt HTTP/1.1" 301 - "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" "-" Any idea why Google would request incorrect URLs like this? Anyone seeing anything similar? Thanks, -t
g1smd msg:4496328 5:06 pm on Sep 17, 2012 (gmt 0)
That request is being redirected. You should check where to. That could be an even bigger problem. triggerfinger msg:4496337 5:21 pm on Sep 17, 2012 (gmt 0)
It gets redirected to the products page. Hence why we get errors, but why would google request a bogus URL like this? triggerfinger msg:4496362 6:42 pm on Sep 17, 2012 (gmt 0)
Another clue: All the URLs seem to have vanity tld URLs redirecting to them. Is G trying to access the robots.txt of these URLs and instead requesting it from the deep page? Seems like a rather dumb idea for such a smart algorithm. examplevanityurl.com -> 301 -> example.com/deepURL.html examplevanityurl.com/robots.txt -> example.com/deepURL.html/robots.txt Seems silly, no? phranque msg:4496391 7:57 pm on Sep 17, 2012 (gmt 0)
if examplevanityurl.com is yours i would look for why that server is doing essentially a sitewide redirect to a subdirectory of example.com and fix it so it redirects to the root specifically for robots.txt request.