How Does Google Treat No-indexed Pages?

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

How Does Google Treat No-indexed Pages?

aristotle

11:08 am on Apr 13, 2011 (gmt 0)

Adding a noindex meta tag to a page's header causes Google to remove the page from its search index. Thus the page will no longer appear in Google's SERPs.

But people can still visit the page, so even though it's no longer indexed, it still contributes content to the site. What I'm wondering is whether this content can affect the rankings of other pages on the site that are still in the index.

In other words, since people can still visit no-indexed pages, does the Google algorithm include them in its evaluation of the overall content and quality of the site?

g1smd

5:06 pm on Apr 28, 2011 (gmt 0)

410 redirects

You have mentioned this several times now.

There is no such thing. 4xx codes are error codes.

Redirect codes are always 3xx.

Bewenched

6:02 pm on Apr 28, 2011 (gmt 0)

Here's a thought. Instead of us blocking what Google thinks is "thin content" Why doesnt google just ignore it themselves...

Other engines dont think it's bad.

bumpski

9:10 pm on Apr 28, 2011 (gmt 0)

Both is double protection, a technique I highly recommend.

I do both robots.txt and NOINDEX as a fail-safe which recently saved my bacon when I made a small mistake updating robots.txt. Thousands of pages would've been indexed in days and it takes forever to get rid of the mess. However, the redundant NOINDEX stopped Googlebot from making a big mess in the first place.

IncrediBill

Historically, at least, if a page is already indexed and you block it in robots.txt and "noindex" it, the page may remain in the index for a long long time. The only way Googlebot will see the "noindex" is if the page is NOT blocked by robots.txt.

So you're right, using both is powerful, as long as the page was not indexed in the first place.

apauto

9:42 pm on Apr 28, 2011 (gmt 0)

it doesn't make sense that doing a noindex on low quality pages would help your site rank better. Google is telling us to do it, but they can still crawl the pages.

So what we're telling Google is:

1) I have low quality content on my site
2) I'm blocking it from you for better rankings
3) Users can still see this low quality content
4) I'm trying to trick you

crobb305

11:38 pm on Apr 28, 2011 (gmt 0)

I read a post that, amongst other thing said: "2) Move any bunk content off-site"
I was previously noindex-ing these pages but now I'm not sure this is the right thing to do. Has anyone else tried moving content off-site with notable results?

Google initially recommended that we improve the content, noindex the page, or delete the page altogether (perhaps move it to another domain). I opted to improve the content on my hardest hit pages that had significant inbound links. I rewrote the articles from scratch with brand new research -- not just "content". Very informative pieces. Those pages have fully recovered (initially fell an average of 200 to 400 positions). They were my hardest hit. I did remove 5 very old/shallow pages that were useless.

To determine usefulness, I looked at the page stats for 2010 (from my traffic logs) along with the inbound links to those pages. I looked at how much traffic they received from internal navigation and from search (as entry pages). The 5 pages I deleted received something like .01% of my traffic in all of 2010. They also had zero inbound links. Deleting them would result in a negligible impact on my traffic, but may have contributed to my site's 70% recovery in the past 4 days. It also improves the site's inbound-link distribution.

So my recommendation is look at your traffic for all of last year on each of those pages. Were they visited enough to even matter? Did they get any entry traffic? Look at the backlinks for those pages. Zero inbound links from external sources? Unless you set out to get some quality links for them, maybe removing them will refocus the juice flowing through the site. If they have backlinks, just redo the content.

-- sorry if this was a little off topic -- I just wanted to offer some thoughts on the previous question

bumpski

12:34 pm on Apr 29, 2011 (gmt 0)

So what we're telling Google is:

1) I have low quality content on my site
2) I'm blocking it from you for better rankings
3) Users can still see this low quality content
4) I'm trying to trick you

Please remember noindex is not like robots.txt. Robots.txt says Google you may NOT crawl this page. BUT, "noindex", says please, Google, do crawl this page, but please do not include it in your indexed results. Actually the highest quality pages may meet this criteria.

So should pages like "terms", "privacy", "disclaimers", "contact" and perhaps even "about", be "noindexed"?. (Remember Google will still crawl these pages!)

pageoneresults

12:53 pm on Apr 29, 2011 (gmt 0)

So what we're telling Google is:

1) I respect your search engine and user-agent guidelines and don't wish to include this page in your results - yet.

So should pages like "terms", "privacy", "disclaimers", "contact" and perhaps even "about", be "noindexed"?. (Remember Google will still crawl these pages!)

Terms, Privacy and Disclaimers - yes. Contact and About - no. Certain documents just don't meet the criteria of being indexworthy. Let's say that Google have a cap on how many documents it will index from your website at any given time. You have 100,000 docs available. Google only wants 75,000 of them. Maybe those 25,000 are intermediary drill down pages e.g. shopping cart structure. In this instance, you'd probably noindex those documents inbetween as they really aren't something you want the visitor landing on from a search as they still have one or two more clicks to go. I'd be willing to wager a bet that landing on a category page is a contributor to high bounce rates.

Category pages with pagination? Usually I noindex them, those are for the visitor when they are on the site and not something I really want them to land on from a search result. My goal has always been to put the visitor right where they need to be. If that means removing certain pages from appearing in the index, I have no qualms with doing that.

crobb305

2:17 pm on Apr 29, 2011 (gmt 0)

noindex Privacy

I have a different opinion on noindexing your privacy policy. I think that privacy policies could be a quality signal. Of all the administrative pages that you could consider noindexing, I personally wouldn't noindex the privacy policy. Even though Google still crawls the page, I fear there could be a mitigation of perceived/calculated site quality when you noindex an important quality signal.

bumpski

10:30 am on Apr 30, 2011 (gmt 0)

In this instance, you'd probably noindex those documents in between as they really aren't something you want the visitor landing on from a search as they still have one or two more clicks to go.

Although the simple supplemental test appears to be broken again... I've found that these types of pages, sometimes I call them index pages, seem to end up being supplemental.
So perhaps studying supplemental pages could suggest what pages Google would prefer be "noindexed".
(Or the content improved!)
Currently it appears only the "non-supplemental" pages feature works, this search:
site:www.example.com/*
appears to show your "non-supplemental" pages.

Historically you could do a search "subtracting" all your non-supplemental pages from all your pages in Googles index for a given domain, like this:
-site:www.example.com/* site:www.example.com
But this convenient test appears to be "broken" yet again.

I'd be willing to wager a bet that landing on a category page is a contributor to high bounce rates.

On the other hand, if these "index" or "navigation" pages do have some useful content, leading to the correct topic on a given site, they may increase your visitors "dwell" time which I feel is crucial for pages to rank highly with few or no inbound links.

Again it all comes down to a measure of quality doesn't it!
This is an interesting discussion, Thanks! (The best discussions about Google are are those the lead to some "serious conjecture"!)

tedster

7:55 pm on Apr 30, 2011 (gmt 0)

site:www.example.com/*

I never found that hack to be all that useful. Matt Cutts once said it was a query that tripped a kind of error routine and caused the URL retrieval to stop part way, but there is no certainty about where that "stop" actually occurs. It apparently does retrieve the most core page first, but it's pretty rough - the retrieval process might stop before it taps any supplemental partitions or after it grabs some, or whatever.

I think it's more informative to do the site: operator on AOL and see how many URLs Google feels are essential enough to export to their partner.

WebPixie

3:52 pm on May 2, 2011 (gmt 0)

All this talk about a situation that doesn't change the user experience at all. You remember the user, the person we are actually designing the site for. How we are suppose to ignore search engines and only focus on the user experience.

I pity anyone who actually listened to G- on that advice. I hope their new career is going well.

aristotle

7:11 pm on May 2, 2011 (gmt 0)

All this talk about a situation that doesn't change the user experience at all.

This forum isn't just about user experience. It's also about Google and SEO.

The original question is whether the Google algorithm considers the content of no-indexed pages when it evaluates the overall quality of a website. I still don't know the answer.

bumpski

9:27 pm on May 2, 2011 (gmt 0)

that doesn't change the user experience at all.

Pageonresults suggests that "noindexing" intermediate navigation pages may improve the end user experience, making it more likely a user ends up on the page with the most relevant information to their search. Seems very helpful to the user/visitor to me.

What we do know:
Google continues to crawl "noindexed" pages.
Google follows links on "noindexed" pages (at least I know this).
I also believe Google does not penalize "noindexed" content in IFramed pages, (at least on the same domain!)

So I would assume as long as it's reasonable, in Google's opinion, that content is "noindexed" it might actually benefit the site's overall ranking (and the user experience as well).

lucy24

10:13 pm on May 2, 2011 (gmt 0)

410 redirects

You have mentioned this several times now.
There is no such thing. 4xx codes are error codes.
Redirect codes are always 3xx.

Tell it to Apache [httpd.apache.org] ;)

ken_b

10:50 pm on May 2, 2011 (gmt 0)

How Does Google Treat No-indexed Pages?

Maybe we should step back a level and ask ...

How Does Google Treat LINKS TO No-indexed Pages?

Is the "No Index" a sign of a lack of trust, or belief in the value of the page linked to?

If we "don't trust" the page we are linking to, why should Google?
.

jdMorgan

2:22 pm on May 6, 2011 (gmt 0)

410 redirects

You have mentioned this several times now.
There is no such thing. 4xx codes are error codes.

Redirect codes are always 3xx.

Tell it to Apache [httpd.apache.org] ;)

The Apache documentation was written by humans, not deities. It is technically in error calling a 400-series error response a redirect. To promote clarity, it should really be called an "internal rewrite," or at least be called an "internal redirect" to avoid confusion.

400-Series error responses do not cause a change to the requested URL on the client side (watch your browser address bar during a 30x response, then compare to a 4xx response). Therefore, this is clearly not a "client redirect." Rather, a 4xx response code is sent, along with the content of the 4xx ErrorDocument (if present) or with server-generated error message text (depending on how you've set it up with the ErrorDocument directive).

If your address bar changes during 400-series error handling, then this indicates that you have improperly defined the ErrorDocument as a URL instead of defining it as a local URL-path only. This is one of the most common errors seen in error-handling configuration on Apache servers -- despite the fact that Apache warns about it in the ErrorDocument directive documentation. Briefly, use

ErrorDocument 404 /404-error-page.html

and never use

ErrorDocument 404 http://example.com/404-error-page.html

as the latter will result in a 302-Found client redirect response instead of the desired 404 error response.

Since a 302 redirect response tells search engines that the document exists but has moved, it is a potentially-serious problem.

---

On the robots.txt versus noindex issue, clarification is also needed:

If you block a resource URL-path (e.g. a page) using robots.txt, then that resource will not be fetched by any robots.txt-compliant robot. Therefore, its on-page "meta-robots" tag is irrelevant except in cases such as that described by IncrediBill where there is a "glare" situation -- where the resource is fetched during the time that you are making robots.txt and on-page "meta-robots" changes, or while an error exists in the robots.txt file.

Resource URL-paths which are Disallowed by robots.txt will not be fetched by robots.txt-compliant robots, but they may still appear as URL-only listings in search results, based on incoming links from other pages.

If a resource URL-path is not Disallowed by robots.txt, then it may be fetched. If it is marked as "no-index," then it will not be placed in the search engine's index, and it will not appear in meta-robots-tag-compliant search engine results.

If a page's URL-path is not Disallowed by robots.txt and if it is marked as "no-follow," then links on this page will not be followed from this page.

So, there is a hierarchy between robots.txt and on-page meta-robots tags, and they do very different things.

---

Also unanswered:

If your server returns either a 404-Not Found or 410-Gone response to a request, then Google shows a 404-Not Found response in their Webmaster Tools status reports. They really should fix this to promote clarity, but that's how it is for now.

Jim

JamesB

5:27 pm on May 8, 2011 (gmt 0)

i have used a combination of ways to remove certain pages post panda

410 gone for page i simply want to remove and will never bring back

noindex - for pages I will improve later (but who knows if G will take these into a quality assessment - if they did every WP blog with tags and cats that are noindex would be affected ?

WMT - complete removal of a directory to delete directories / section of the site i do not wish to bring back.

I have been waiting a long time to have pages removed from Gs index and one possibility i was considering is removing all my indexed wordpress tags and archive pages through WMT directory removal and then adding the noindex,follow tag to the page afterwards and removing the block from robots.txt

Im not sure if the above would send the right signal out to Google though but it seems like a quicker way than waiting for the noindex tag i have added to these pages to kick in.

potentialgeek

8:07 pm on Jul 12, 2011 (gmt 0)

Just finished reading the entire thread. Any new info from Google or updates related to the original question?

Google seems to have assumed responsibility for user experience not only on the landing pages but also for any other pages on the site with the Panda update. (Any weak pages can get a site Pandalized, not just weak landing pages.) So for that reason it seems the noindex advice they offer is at odds with their goal of sitewide quality.

affiliation

3:22 pm on Oct 17, 2011 (gmt 0)

An up to date answer on this would be appreciated.

I am adding a Privacy Policy at the moment. I purchased this to suit my site, but there is a lot of content that is similiar if not the same as many other Privacy Policies. In my heart it is saying to use this tag, <meta name="robots" content="noindex, follow">, because of the duplicate content issue. I am reluctant to use this as it could be the wrong thing to do, any thoughts or anyone using this without being hit by Panda.

Am I better to re-write making sure no duplicate content appears and just leave it as let G decide.

netmeg

3:30 pm on Oct 17, 2011 (gmt 0)

Pretty sure that Google can recognize a privacy policy. I doubt it would hurt you. That said, I always noindex mine, just because there's no reason to index it. I don't believe it has hurt me.

aristotle

3:36 pm on Oct 17, 2011 (gmt 0)

If you just want to no-index one page, I think it's probably okay. I've had a few no-indexed pages on my sites for years, and none of them have been affected by Panda. Google usually doesn't penalize sites for very minor "infractions".

affiliation

4:24 pm on Oct 17, 2011 (gmt 0)

@netmeg & @aristotle
Thanks, I will noindex and see how it goes. Should a privacy policy be a sitwide footer link, or just home page?

netmeg

5:24 pm on Oct 17, 2011 (gmt 0)

On my sites with AdSense I put it sitewide, just in case someone from G comes in to check and decides they can't find it quickly enough. On sites without AdSense, I figure once is probably enough.

affiliation

8:26 pm on Oct 17, 2011 (gmt 0)

Just a further thought, perhaps going in a diferent direction, but with people talking about noindexing pages to escape Panda has got me thinking.

My pages use named anchor's.
I have not been using a back to top link, but I am planning to include these on new pages. I got thinking about duplicate content issues, so should the nofollow tag be used for the back to top links. Perhaps I should be using nofollw on all the named anchors also?

<a href="#More">More Content</a>
<a href="#More" rel="nofollow">More Content</a>

<a href='#' rel='nofollow'>Back to Top</a>

Has anyone any thoughts on this?

aristotle

8:53 pm on Oct 17, 2011 (gmt 0)

In my opinion you should avoid putting the nofollow tag on any links except in special circumstances, so unless I'm missing something I don't think you should use it for "back to top" links .

Returning to your original question about the Privacy Policy page, it occurred to me that using robots.txt to block Google from crawling it might be worth considering.

netmeg

11:00 pm on Oct 17, 2011 (gmt 0)

(Unless you're using a Google Product that requires a privacy policy, such as AdSense, in which case you might want them to know you have one)

aristotle

11:58 pm on Oct 17, 2011 (gmt 0)

(Unless you're using a Google Product that requires a privacy policy, such as AdSense, in which case you might want them to know you have one)

I'm sure you're right. I've never used Adsense and don't know their rules.

This 57 message thread spans 2 pages: 57