Welcome to WebmasterWorld Guest from 54.205.106.138

Forum Moderators: Robert Charlton & aakk9999 & andy langton & goodroi

Message Too Old, No Replies

WMT reports allowed files "restricted by Robots.txt"

     
6:57 pm on Oct 18, 2010 (gmt 0)

Full Member

10+ Year Member

joined:July 31, 2006
posts:297
votes: 0


Why is it that Google ignores changes to my Robots.txt ?

This file gets spidered successfully each and every day, yet I still see certain files that are "restricted by Robots.txt" in WMT, which haven't been included in robots.txt in many, many months. They were once prohibited in robots.txt, but those definitions have long been removed, and there is no chance that any sort of pattern matching is prohibited them from being crawled via robots.txt. Additionally, I have pumped my entire robots.txt file into their test utility, and these files are never denied, using this simulation tool.

Anyone else have this problem ? Is there a remedy here ?
8:47 pm on Oct 18, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member tedster is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:May 26, 2000
posts:37301
votes: 0


I've just been looking at a similar issue where the robots.txt file allows everything to be crawled, and yet WMT says 14 files are disallowed. In that case, those files are crawled and indexed anyway. WMT does have buggy data occasionally - or even frequently. What do we expect for "free", dependability? [hint - yes, I do]
10:37 pm on Oct 18, 2010 (gmt 0)

Full Member

10+ Year Member

joined:July 31, 2006
posts:297
votes: 0


Thanks for your feedback, Ted. As a former developer/manager of institutional trading systems, I always demanded perfection of myself and my staff. In my naive little world, I expect the same of others, particularly when you are a mega-billion company. Things like this irk me, but unfortunately, part of the problem is my compulsive personality and expectations. The bigger Google has gotten, the more problems we have seen. At least, IMO.
11:08 pm on Oct 18, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
posts:25430
votes: 0


It's provable that the GWMT robots.txt tester does not use the same code as the real Googlebot.

On several sites, I use multiple user-agent declarations per policy record - a construct that has been explicitly defined by the "Standard" since its adoption. However GWMT's robots.txt tester has reported (since its adoption) that the sites cannot be crawled at all.

It's also possible that the "Crawl Errors" you're seeing are not generated using the same code as either the real Googlebot or the robot.txt tester, but that it uses a third version instead.

But regardless, there are plenty of bugs in the GWT, and it sounds like this is just another.

Jim
11:32 pm on Oct 18, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member tedster is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:May 26, 2000
posts:37301
votes: 0


I always demanded perfection of myself and my staff. In my naive little world, I expect the same of others, particularly when you are a mega-billion company.

It's the challenge that comes with scale. Petabytes of data in constant churn just doesn't allow for perfection - and it's something that few of us have ever grappled with. Still, I'm sure Google can do better than we currently see in WebmasterTools.
 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members