Forum Moderators: Robert Charlton & goodroi
there's nothing in robots.txt that names the googlebot?
User-agent: bot
... /[REMOVED_BY_ME]/review/[REMOVED_BY_ME].html
it has been mentioned numerous times in this thread that the noindex directive is irrelevant when you have excluded googlebot from crawling that url.
Point being?
These pages show up in the SERPs from time to time with the "description blocked by robots.txt" statement.
Cross-check: AND there's nothing in robots.txt that names the googlebot?
that was my next question - not "googlebot" specifically, but any substring of googlebot's user agent string.
for example:
User-agent: bot
and none of those exclusions in your robots.txt fragment would necessarily match a /.../review/ subdirectory as indicated in your access log sample:
Disallow: */review/
robots.txt excluded pages - They get stored in their DB but they use the robots.txt rules (which is also stored in their DB for every site) to hide the real descriptions and show only the boilerplate description in the SERPS.
[edited by: phranque at 8:31 am (utc) on Jul 2, 2013]
It was like this for a LONG, LONG time - oh, my...
Disallow: */review/