Convergence - 8:53 am on Jun 30, 2013 (gmt 0)
You folks have me confused - lol.
We have a directory which is listed in robots.txt that is disallowed called /merchant/.
All the pages in that directory have noindex in the meta tags.
<meta name='robots' content='noindex, nofollow' />
The only point of access to this directory, and the pages contained within, is through a link on the merchant's product page (located on our site). That link on the product page currently does NOT have rel="nofollow" in the link URL.
Yes, I admit that it is possible someone, somewhere, could have bookmarked a specific merchant and the Googlebot is following the link from there. However, with hundreds of merchants, we're pretty confident that people aren't doing this en masse. We will see heavy crawling by the Googlebot, then in a day or two those pages will be in the SERPs, with aforementioned "description is blocked by robots.txt". Then there will be some sort of data update/refresh and the pages are gone.
As you can see, the Googlebot does not visit our robots.txt very often. 147 times while crawling 531K pages.
Testing in WMT yields the following:
Blocked by line 12: Disallow: /merchant/
Detected as a directory; specific files may have different restrictions
Am I missing something here?