homepage Welcome to WebmasterWorld Guest from 23.23.12.202
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

This 67 message thread spans 3 pages: 67 ( [1] 2 3 > >     
page is noindexed, but still shows in SERP with a Google notice
SEOPanda



 
Msg#: 4588243 posted 5:34 pm on Jun 27, 2013 (gmt 0)

I have a page which I noindexed many months ago (in meta and robots.txt), and it shows for a site operator + keyword search.

the description says:

A description for this result is not available because of this site's robots.txt learn more.

Clicking on learn more takes me here:

https://support.google.com/webmasters/answer/156449?hl=en

Anyone see this before?

 

goodroi

WebmasterWorld Administrator goodroi us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4588243 posted 8:12 pm on Jun 27, 2013 (gmt 0)

If you disallow it in robots.txt, Google can't crawl the page to see the noindex meta tag.

dethfire

5+ Year Member



 
Msg#: 4588243 posted 8:19 pm on Jun 27, 2013 (gmt 0)

If it can't crawl it, why would Google want it in the index?

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4588243 posted 9:07 pm on Jun 27, 2013 (gmt 0)

Because someone, somewhere, has linked to it.

CredibleZephyre



 
Msg#: 4588243 posted 9:17 pm on Jun 27, 2013 (gmt 0)

GoodROI and lucy both got it right.

A robot.txt doesn't prevent a page from showing up in the SERP, it only prevents it from being crawled. If that page is linked to from enough outside sources the URL will still show in a SERP, but without any additional information (meta description etc.)

A meta no-index tag is on the specific page and prevents that page from showing in the SERP altogether... the only catch is the page has to be crawled for the crawler to find the meta no-index tag.

So take that page off your robots.txt is the moral of the story

not2easy

WebmasterWorld Administrator 5+ Year Member Top Contributors Of The Month



 
Msg#: 4588243 posted 10:34 pm on Jun 27, 2013 (gmt 0)

And make sure it is not in your sitemap if you have one.

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4588243 posted 10:51 pm on Jun 27, 2013 (gmt 0)

If a page is in the sitemap, and it isn't roboted-out, will this override a "noindex" on the page itself? g### does occasionally hint that they will disregard a site owner's expressed wishes if they feel like it. (Where "if they feel like it" is shorthand for a long and complicated explanation that I can't lay my hands on at the moment.)

Convergence



 
Msg#: 4588243 posted 12:04 am on Jun 28, 2013 (gmt 0)

Have seen internal links without a rel=nofollow tag, with a noindex header, and blocked in robots.txt - still show up in the SERPs with the description showing:

"A description for this result is not available because of this site's robots.txt learn more."

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4588243 posted 12:17 am on Jun 28, 2013 (gmt 0)

with a noindex header, and blocked in robots.txt

If the page is roboted-out, the search engine cannot see the "noindex" header.

indyank

WebmasterWorld Senior Member



 
Msg#: 4588243 posted 2:08 am on Jun 28, 2013 (gmt 0)

@lucy24, If a page linked to from other sites is password protected but not roboted out, what will be the status on Google SERPS?

phranque

WebmasterWorld Administrator phranque us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4588243 posted 2:39 am on Jun 28, 2013 (gmt 0)

If a page linked to from other sites is password protected


"password protected" as in a "401 status code" response or a "redirect to login page"?

indyank

WebmasterWorld Senior Member



 
Msg#: 4588243 posted 2:45 am on Jun 28, 2013 (gmt 0)

I meant "password protected" as in a "401 status code" response but would be interested in knowing the answers for both...

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4588243 posted 3:37 am on Jun 28, 2013 (gmt 0)

Closely related question:
If a page comes with
X-Robots-Tag "noindex"
(as some of my non-html pages do)
will this directive be honored in html pages that don't have a meta robots tag?

:: detour here to make sure the header has been working as intended with my non-page files ::

Here, again, the search engine will only see the header if it is allowed to receive the page. But it's an alternative way of conveying the same information. Useful if for example you don't want the index to hint at the existence of anything within a particular directory, even if you happened to forget the meta on one page.

Robert Charlton

WebmasterWorld Administrator robert_charlton us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4588243 posted 7:37 am on Jun 28, 2013 (gmt 0)

Related discussion....

Pages are indexed even after blocking in robots.txt
http://www.webmasterworld.com/google/4490125.htm [webmasterworld.com]

phranque

WebmasterWorld Administrator phranque us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4588243 posted 9:05 am on Jun 28, 2013 (gmt 0)

I meant "password protected" as in a "401 status code" response but would be interested in knowing the answers for both...


as far as i know google does not index any 4xx status code responses.

these are reported in GWT as "URL Errors".
you can see these, grouped with any 401, 403 and 407 responses by going to "Health"/"Crawl Errors"/Access denied".

as far as redirecting to a login page, that depends on what status code is used for the redirect.

phranque

WebmasterWorld Administrator phranque us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4588243 posted 9:08 am on Jun 28, 2013 (gmt 0)

If a page comes with
X-Robots-Tag "noindex"
(as some of my non-html pages do)
will this directive be honored in html pages that don't have a meta robots tag?


X-Robots-Tag is intended for resources that are non-html documents and therefore cannot provide a meta robots noindex element, but the X-Robots-Tag HTTP Response header works equally well for any Content-Type.

chalkywhite



 
Msg#: 4588243 posted 10:04 am on Jun 28, 2013 (gmt 0)

Ok so lets say you wanted to remove these urls that are in a subfolder. IN GWT would you use the following syntax.

www.domain.com/*/folderofurlsyouwanttoremove/

Note that is a subfolder.

phranque

WebmasterWorld Administrator phranque us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4588243 posted 10:50 am on Jun 28, 2013 (gmt 0)

Ok so lets say you wanted to remove these urls that are in a subfolder. IN GWT would you use the following syntax.

why are you removing these urls?
are the urls in the index?
are they meta robots (or X-Robots-Tag) noindexed?
is the directory excluded from crawling?
are these urls getting 404/410 responses?

When NOT to use the URL removal tool - Webmaster Tools Help:
http://support.google.com/webmasters/answer/1269119 [support.google.com]


www.domain.com/*/folderofurlsyouwanttoremove/

it appears you can't use wildcarding when specifying the removal url.

Find the URL of a page - Webmaster Tools Help:
http://support.google.com/webmasters/answer/63758 [support.google.com]

chalkywhite



 
Msg#: 4588243 posted 11:05 am on Jun 28, 2013 (gmt 0)

Thanks phranque, the pages are beingredirected now. Panda smack with trackback URLS on a wordpress site - dupe content.

indyank

WebmasterWorld Senior Member



 
Msg#: 4588243 posted 11:27 am on Jun 28, 2013 (gmt 0)

as far as i know google does not index any 4xx status code responses.


Thanks phranque. Yes it shouldn't and that should be the right behavior.

But will it appear in the form of link only stubs in SERPS with a description that is similar to this one.

"A description for this result is not available because of this site's robots.txt learn more. "

If it doesn't even show up in the SERPS, why does Google chose to show link only stubs for robots.txt excluded pages? The argument that those links are found on other sites, should hold good for password protected pages as well, isn't it?

phranque

WebmasterWorld Administrator phranque us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4588243 posted 11:47 am on Jun 28, 2013 (gmt 0)

"A description for this result is not available because of this site's robots.txt learn more. "


if you have excluded googlebot from crawling a url it will never see the 4xx response and therefore doesn't know that the content is password-protected.

phranque

WebmasterWorld Administrator phranque us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4588243 posted 11:49 am on Jun 28, 2013 (gmt 0)

the pages are beingredirected now.


if the pages are being redirected then the urls are not suitable for a removal request.

indyank

WebmasterWorld Senior Member



 
Msg#: 4588243 posted 11:59 am on Jun 28, 2013 (gmt 0)

No, sorry for the confusion if any. They aren't excluded from ronots.txt but only password protected.

My question - is there any differential treatment for password protected pages vs robot.txt excluded pages in Google SERPS? We know that robots.txt excluded pages show up as link only stubs in SERPS with a description posted by the OP. But what about password protected pages? If they don't show up in SERPS at all, why this differential treatment?

phranque

WebmasterWorld Administrator phranque us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4588243 posted 12:15 pm on Jun 28, 2013 (gmt 0)

the difference is that a 401 is unambiguous.

aakk9999

WebmasterWorld Administrator 5+ Year Member



 
Msg#: 4588243 posted 1:25 pm on Jun 28, 2013 (gmt 0)

My question - is there any differential treatment for password protected pages vs robot.txt excluded pages in Google SERPS? We know that robots.txt excluded pages show up as link only stubs in SERPS with a description posted by the OP. But what about password protected pages? If they don't show up in SERPS at all, why this differential treatment?


Yes, there is a different treatment.

As others said above - if the page is excluded from crawling via robots.txt, Google is only told it is not allowed to crawl the page and therefore will not be (should not be) requesting it. Hence it cannot see any other directive such as:

- HTTP response code (including these that are 301, 401, 403, 404, etc)
- on-page robots meta such as noindex, noodp etc

Hence pages that are roboted out may show in SERPs as Google was only told not to crawl them and does not know about any other directive or response code that might result in different page handling.

Think of it like this:

If I forbid you to ring the doorbell, you cannot tell whether I am at home or not, in fact you cannot even see if it was me living at this address nor even whether the door exists. All that you know is that someone talked about my door.

indyank

WebmasterWorld Senior Member



 
Msg#: 4588243 posted 3:25 pm on Jun 28, 2013 (gmt 0)

Hence pages that are roboted out may show in SERPs as Google was only told not to crawl them and does not know about any other directive or response code that might result in different page handling.


But doesn't the same hold true for password protected pages? By password protection we are telling them, they are not allowed to crawl. Google is forced to obey as they might not know to break past the password. But Google does get a hint a page exists for that URL as someone has linked to the password protected page. Why aren't they showing the password protected URLS in the SERPS with a boilerplate description, like they do for roboted out URLs?

phranque

WebmasterWorld Administrator phranque us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4588243 posted 6:46 pm on Jun 28, 2013 (gmt 0)

Why aren't they showing the password protected URLS in the SERPS with a boilerplate description, like they do for roboted out URLs?


a 4xx response means for all practical purposes the requested resource doesn't exist or is not available.
excluding a robot from crawling says nothing about the status of the resource for a live visitor - it's only an instruction for the robot.

aakk9999

WebmasterWorld Administrator 5+ Year Member



 
Msg#: 4588243 posted 6:49 pm on Jun 28, 2013 (gmt 0)

But doesn't the same hold true for password protected pages? By password protection we are telling them, they are not allowed to crawl.


I can see what you are thinking but I believe it is not the same. If a page is password protected, then visitors cannot see the page either unless they know the password. Visitors that know the password may be only a selected few - and if a visitor knows the password, they would probably know the URL too. Hence it is probably not good for Google to have such page in index because if a click from SERPs requires a password it is most likely a bad experience for visitors coming from SERPs.

But restricting access via robots.txt is for bots only - they are not allowed to go there, but visitors see the page.

Unless you are cloaking, of course, and have a page password protected for bots only - then visitors would see it without password, but Google would not know this, so why should it show in its index.


<added>Which is pretty much what phranque said in the post above.</added>

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4588243 posted 7:46 pm on Jun 28, 2013 (gmt 0)

www.domain.com/*/folderofurlsyouwanttoremove/

Note that is a subfolder.

The wild-card formulation only makes sense if you have a bunch of different directories all containing a subdirectory with the same name-- obvious example, a group of directory-specific /images/ subdirectories. Is that what you're aiming at?

Hence it is probably not good for Google to have such page in index because if a click from SERPs requires a password it is most likely a bad experience for visitors coming from SERPs.

That doesn't seem to stop sites from doing it. At the access level it's done with a "Satisfy any" directive: visitor has to either know the password, or be the googlebot. It's absolutely infuriating to the human visitor, but the sites don't seem to care. "The full text of this article-- including the content you searched for-- is only available to logged-in members."

aakk9999

WebmasterWorld Administrator 5+ Year Member



 
Msg#: 4588243 posted 8:32 pm on Jun 28, 2013 (gmt 0)

@lucy
I wonder if we are talking cross-wired. The question I was answering was why pages returning HTTP 401 to Googlebot are not in index (as opposed to pages excluded by robots.txt, which may be).

I am not entirely sure what you are referring to - are you saying pages responding with 401 are in index? Or perhaps are you referring to "first click free" situations?

This 67 message thread spans 3 pages: 67 ( [1] 2 3 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved