Welcome to WebmasterWorld Guest from 54.159.50.111

Message Too Old, No Replies

page is noindexed, but still shows in SERP with a Google notice

     
5:34 pm on Jun 27, 2013 (gmt 0)

Junior Member

joined:Dec 15, 2011
posts: 66
votes: 0


I have a page which I noindexed many months ago (in meta and robots.txt), and it shows for a site operator + keyword search.

the description says:

A description for this result is not available because of this site's robots.txt learn more.

Clicking on learn more takes me here:

[support.google.com...]

Anyone see this before?
6:48 am on July 2, 2013 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:10542
votes: 8


there's nothing in robots.txt that names the googlebot?

that was my next question - not "googlebot" specifically, but any substring of googlebot's user agent string.

for example:
User-agent: bot
...



and none of those exclusions in your robots.txt fragment would necessarily match a /.../review/ subdirectory as indicated in your access log sample:
/[REMOVED_BY_ME]/review/[REMOVED_BY_ME].html



it has been mentioned numerous times in this thread that the noindex directive is irrelevant when you have excluded googlebot from crawling that url.

Point being?

it's not useful information for your problem statement.
These pages show up in the SERPs from time to time with the "description blocked by robots.txt" statement.

if the description is blocked, so are all other meta elements.
7:41 am on July 2, 2013 (gmt 0)

Senior Member

WebmasterWorld Senior Member 5+ Year Member

joined:Mar 9, 2010
posts:1806
votes: 9


I believe this is what happens though Google and many here might not agree.

Password protected pages - Googlebot cannot access them,so they have nothing to store in their DB for those URLs and hence discarded i.e.the pages are completely ignored and nothing goes into their DB.

robots.txt excluded pages - They get stored in their DB but they use the robots.txt rules (which is also stored in their DB for every site) to hide the real descriptions and show only the boilerplate description in the SERPS.
8:03 am on July 2, 2013 (gmt 0)

Junior Member

joined:Aug 14, 2012
posts:79
votes: 0


Cross-check: AND there's nothing in robots.txt that names the googlebot?


that was my next question - not "googlebot" specifically, but any substring of googlebot's user agent string.

for example:
User-agent: bot


No - not needed.

and none of those exclusions in your robots.txt fragment would necessarily match a /.../review/ subdirectory as indicated in your access log sample:


BINGO - I literally sit corrected. Yes. Human error on our part. Wow.

You guys/gals are correct.

When we tested the robots.txt in WMT we tested:

Example.com/review/ and not example.com/niche-cat/review/

Now when I tested it correctly with
Disallow: */review/


We get that it is blocked.

Thanks for your patience AND the time to help talk this through.

Our mystery is solved. Always wondered why it was just on this one site, lol.

Thanks again!
8:29 am on July 2, 2013 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:10542
votes: 8


thanks for posting a follow-up, Convergence!

you have to admit that if (the) google made a practice of ignoring robots.txt exclusions it would be difficult to hide that fact and you wouldn't have to look too far to find hundreds of "googlez ignoring robots.txt!" blog posts and forum threads.

i think you're okay but make sure you retest example.com/review/ for exclusion.
8:30 am on July 2, 2013 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:10542
votes: 8


robots.txt excluded pages - They get stored in their DB but they use the robots.txt rules (which is also stored in their DB for every site) to hide the real descriptions and show only the boilerplate description in the SERPS.


the truth is in your server access log.

[edited by: phranque at 8:31 am (utc) on Jul 2, 2013]

8:31 am on July 2, 2013 (gmt 0)

Junior Member

joined:Aug 14, 2012
posts:79
votes: 0


Hi phranque,

Just tested each exclusion, TWICE. LOL - NOW it's correct..

Thanks, again!

PS: It was like this for a LONG, LONG time - oh, my...
8:54 am on July 2, 2013 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:10542
votes: 8


It was like this for a LONG, LONG time - oh, my...

this will make you blind to that as a problem since you will assume "it has always worked".

Disallow: */review/

note that while googlebot respects wildcarding you can't expect all well-behaved bots to respect this rule since this extension to the robots exclusion protocol was introduced by google.
This 67 message thread spans 3 pages: 67