homepage Welcome to WebmasterWorld Guest from 107.22.45.61
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Subscribe and Support WebmasterWorld
Visit PubCon.com
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

This 67 message thread spans 3 pages: < < 67 ( 1 2 [3]     
page is noindexed, but still shows in SERP with a Google notice
SEOPanda




msg:4588245
 5:34 pm on Jun 27, 2013 (gmt 0)

I have a page which I noindexed many months ago (in meta and robots.txt), and it shows for a site operator + keyword search.

the description says:

A description for this result is not available because of this site's robots.txt learn more.

Clicking on learn more takes me here:

https://support.google.com/webmasters/answer/156449?hl=en

Anyone see this before?

 

phranque




msg:4589500
 6:48 am on Jul 2, 2013 (gmt 0)

there's nothing in robots.txt that names the googlebot?

that was my next question - not "googlebot" specifically, but any substring of googlebot's user agent string.

for example:
User-agent: bot
...



and none of those exclusions in your robots.txt fragment would necessarily match a /.../review/ subdirectory as indicated in your access log sample:
/[REMOVED_BY_ME]/review/[REMOVED_BY_ME].html



it has been mentioned numerous times in this thread that the noindex directive is irrelevant when you have excluded googlebot from crawling that url.

Point being?

it's not useful information for your problem statement.
These pages show up in the SERPs from time to time with the "description blocked by robots.txt" statement.

if the description is blocked, so are all other meta elements.

indyank




msg:4589514
 7:41 am on Jul 2, 2013 (gmt 0)

I believe this is what happens though Google and many here might not agree.

Password protected pages - Googlebot cannot access them,so they have nothing to store in their DB for those URLs and hence discarded i.e.the pages are completely ignored and nothing goes into their DB.

robots.txt excluded pages - They get stored in their DB but they use the robots.txt rules (which is also stored in their DB for every site) to hide the real descriptions and show only the boilerplate description in the SERPS.

Convergence




msg:4589516
 8:03 am on Jul 2, 2013 (gmt 0)

Cross-check: AND there's nothing in robots.txt that names the googlebot?


that was my next question - not "googlebot" specifically, but any substring of googlebot's user agent string.

for example:
User-agent: bot


No - not needed.

and none of those exclusions in your robots.txt fragment would necessarily match a /.../review/ subdirectory as indicated in your access log sample:


BINGO - I literally sit corrected. Yes. Human error on our part. Wow.

You guys/gals are correct.

When we tested the robots.txt in WMT we tested:

Example.com/review/ and not example.com/niche-cat/review/

Now when I tested it correctly with
Disallow: */review/


We get that it is blocked.

Thanks for your patience AND the time to help talk this through.

Our mystery is solved. Always wondered why it was just on this one site, lol.

Thanks again!

phranque




msg:4589528
 8:29 am on Jul 2, 2013 (gmt 0)

thanks for posting a follow-up, Convergence!

you have to admit that if (the) google made a practice of ignoring robots.txt exclusions it would be difficult to hide that fact and you wouldn't have to look too far to find hundreds of "googlez ignoring robots.txt!" blog posts and forum threads.

i think you're okay but make sure you retest example.com/review/ for exclusion.

phranque




msg:4589529
 8:30 am on Jul 2, 2013 (gmt 0)

robots.txt excluded pages - They get stored in their DB but they use the robots.txt rules (which is also stored in their DB for every site) to hide the real descriptions and show only the boilerplate description in the SERPS.


the truth is in your server access log.

[edited by: phranque at 8:31 am (utc) on Jul 2, 2013]

Convergence




msg:4589530
 8:31 am on Jul 2, 2013 (gmt 0)

Hi phranque,

Just tested each exclusion, TWICE. LOL - NOW it's correct..

Thanks, again!

PS: It was like this for a LONG, LONG time - oh, my...

phranque




msg:4589534
 8:54 am on Jul 2, 2013 (gmt 0)

It was like this for a LONG, LONG time - oh, my...

this will make you blind to that as a problem since you will assume "it has always worked".

Disallow: */review/

note that while googlebot respects wildcarding you can't expect all well-behaved bots to respect this rule since this extension to the robots exclusion protocol was introduced by google.

This 67 message thread spans 3 pages: < < 67 ( 1 2 [3]
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About
© Webmaster World 1996-2014 all rights reserved