homepage Welcome to WebmasterWorld Guest from 184.73.104.82
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

    
WMT reports allowed files "restricted by Robots.txt"
doughayman

5+ Year Member



 
Msg#: 4218403 posted 6:57 pm on Oct 18, 2010 (gmt 0)

Why is it that Google ignores changes to my Robots.txt ?

This file gets spidered successfully each and every day, yet I still see certain files that are "restricted by Robots.txt" in WMT, which haven't been included in robots.txt in many, many months. They were once prohibited in robots.txt, but those definitions have long been removed, and there is no chance that any sort of pattern matching is prohibited them from being crawled via robots.txt. Additionally, I have pumped my entire robots.txt file into their test utility, and these files are never denied, using this simulation tool.

Anyone else have this problem ? Is there a remedy here ?

 

tedster

WebmasterWorld Senior Member tedster us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4218403 posted 8:47 pm on Oct 18, 2010 (gmt 0)

I've just been looking at a similar issue where the robots.txt file allows everything to be crawled, and yet WMT says 14 files are disallowed. In that case, those files are crawled and indexed anyway. WMT does have buggy data occasionally - or even frequently. What do we expect for "free", dependability? [hint - yes, I do]

doughayman

5+ Year Member



 
Msg#: 4218403 posted 10:37 pm on Oct 18, 2010 (gmt 0)

Thanks for your feedback, Ted. As a former developer/manager of institutional trading systems, I always demanded perfection of myself and my staff. In my naive little world, I expect the same of others, particularly when you are a mega-billion company. Things like this irk me, but unfortunately, part of the problem is my compulsive personality and expectations. The bigger Google has gotten, the more problems we have seen. At least, IMO.

jdMorgan

WebmasterWorld Senior Member jdmorgan us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4218403 posted 11:08 pm on Oct 18, 2010 (gmt 0)

It's provable that the GWMT robots.txt tester does not use the same code as the real Googlebot.

On several sites, I use multiple user-agent declarations per policy record - a construct that has been explicitly defined by the "Standard" since its adoption. However GWMT's robots.txt tester has reported (since its adoption) that the sites cannot be crawled at all.

It's also possible that the "Crawl Errors" you're seeing are not generated using the same code as either the real Googlebot or the robot.txt tester, but that it uses a third version instead.

But regardless, there are plenty of bugs in the GWT, and it sounds like this is just another.

Jim

tedster

WebmasterWorld Senior Member tedster us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4218403 posted 11:32 pm on Oct 18, 2010 (gmt 0)

I always demanded perfection of myself and my staff. In my naive little world, I expect the same of others, particularly when you are a mega-billion company.

It's the challenge that comes with scale. Petabytes of data in constant churn just doesn't allow for perfection - and it's something that few of us have ever grappled with. Still, I'm sure Google can do better than we currently see in WebmasterTools.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved