phranque - 6:41 pm on Jul 1, 2013 (gmt 0)
Why does this appear to be so unbelievable
i've actually been paying attention to this issue for several years.
i've read a fair number of threads in this forum and a few more in other forums and in every case i've seen where it was claimed that googlebot ignored robots.txt it turned out to be a misunderstanding or a technical issue.
what does the ruleset look like that excludes googlebot from this directory?
do you have a googlebot-specific section with all exclusions intended for googlebot?
i assume you realize that disallowed urls are matched left-to-right, starting with the root directory slash.
I can remember searching on the Google for some sort of "how to" related to website coding - came across a W3Schools result in the SERPs that said "A description for this result is not available because of this site's robots.txt – learn more". Upon clicking on the link, it was exactly what I was looking for.
that could easily be due to anchor text rather than on-page factors.
Personally believe EVERYTHING gets crawled...
i've also plowed through quite a number of log files and never seen a misbehavior by googlebot re:robots.txt