homepage Welcome to WebmasterWorld Guest from 54.211.68.132
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

This 67 message thread spans 3 pages: < < 67 ( 1 2 [3]     
page is noindexed, but still shows in SERP with a Google notice
SEOPanda



 
Msg#: 4588243 posted 5:34 pm on Jun 27, 2013 (gmt 0)

I have a page which I noindexed many months ago (in meta and robots.txt), and it shows for a site operator + keyword search.

the description says:

A description for this result is not available because of this site's robots.txt learn more.

Clicking on learn more takes me here:

https://support.google.com/webmasters/answer/156449?hl=en

Anyone see this before?

 

phranque

WebmasterWorld Administrator phranque us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4588243 posted 6:48 am on Jul 2, 2013 (gmt 0)

there's nothing in robots.txt that names the googlebot?

that was my next question - not "googlebot" specifically, but any substring of googlebot's user agent string.

for example:
User-agent: bot
...



and none of those exclusions in your robots.txt fragment would necessarily match a /.../review/ subdirectory as indicated in your access log sample:
/[REMOVED_BY_ME]/review/[REMOVED_BY_ME].html



it has been mentioned numerous times in this thread that the noindex directive is irrelevant when you have excluded googlebot from crawling that url.

Point being?

it's not useful information for your problem statement.
These pages show up in the SERPs from time to time with the "description blocked by robots.txt" statement.

if the description is blocked, so are all other meta elements.

indyank

WebmasterWorld Senior Member



 
Msg#: 4588243 posted 7:41 am on Jul 2, 2013 (gmt 0)

I believe this is what happens though Google and many here might not agree.

Password protected pages - Googlebot cannot access them,so they have nothing to store in their DB for those URLs and hence discarded i.e.the pages are completely ignored and nothing goes into their DB.

robots.txt excluded pages - They get stored in their DB but they use the robots.txt rules (which is also stored in their DB for every site) to hide the real descriptions and show only the boilerplate description in the SERPS.

Convergence



 
Msg#: 4588243 posted 8:03 am on Jul 2, 2013 (gmt 0)

Cross-check: AND there's nothing in robots.txt that names the googlebot?


that was my next question - not "googlebot" specifically, but any substring of googlebot's user agent string.

for example:
User-agent: bot


No - not needed.

and none of those exclusions in your robots.txt fragment would necessarily match a /.../review/ subdirectory as indicated in your access log sample:


BINGO - I literally sit corrected. Yes. Human error on our part. Wow.

You guys/gals are correct.

When we tested the robots.txt in WMT we tested:

Example.com/review/ and not example.com/niche-cat/review/

Now when I tested it correctly with
Disallow: */review/


We get that it is blocked.

Thanks for your patience AND the time to help talk this through.

Our mystery is solved. Always wondered why it was just on this one site, lol.

Thanks again!

phranque

WebmasterWorld Administrator phranque us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4588243 posted 8:29 am on Jul 2, 2013 (gmt 0)

thanks for posting a follow-up, Convergence!

you have to admit that if (the) google made a practice of ignoring robots.txt exclusions it would be difficult to hide that fact and you wouldn't have to look too far to find hundreds of "googlez ignoring robots.txt!" blog posts and forum threads.

i think you're okay but make sure you retest example.com/review/ for exclusion.

phranque

WebmasterWorld Administrator phranque us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4588243 posted 8:30 am on Jul 2, 2013 (gmt 0)

robots.txt excluded pages - They get stored in their DB but they use the robots.txt rules (which is also stored in their DB for every site) to hide the real descriptions and show only the boilerplate description in the SERPS.


the truth is in your server access log.

[edited by: phranque at 8:31 am (utc) on Jul 2, 2013]

Convergence



 
Msg#: 4588243 posted 8:31 am on Jul 2, 2013 (gmt 0)

Hi phranque,

Just tested each exclusion, TWICE. LOL - NOW it's correct..

Thanks, again!

PS: It was like this for a LONG, LONG time - oh, my...

phranque

WebmasterWorld Administrator phranque us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4588243 posted 8:54 am on Jul 2, 2013 (gmt 0)

It was like this for a LONG, LONG time - oh, my...

this will make you blind to that as a problem since you will assume "it has always worked".

Disallow: */review/

note that while googlebot respects wildcarding you can't expect all well-behaved bots to respect this rule since this extension to the robots exclusion protocol was introduced by google.

This 67 message thread spans 3 pages: < < 67 ( 1 2 [3]
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved