No index alone is not working - so what else is required - Google Search and SEO forum at WebmasterWorld

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

No index alone is not working - so what else is required

Whitey

12:12 pm on Oct 6, 2011 (gmt 0)

Tedster - I've been dubious about the "noindex" advice that was being pushed out, even by some Googlers. Somehow it just doesn't make solid sense to me - after all, the pages are still "there", just not being used as search landing pages.

How can this be a good user experience in Google's Panda eyes. Linking out to no indexed pages or indeed having them on your site has got to signal, these pages are still no good and they are part of the bad user experience.

Surely Google is taking these into account? Any views on how to better approach the de indexing element to assist a Panda release. Get rid of them completely?

aristotle

3:40 pm on Oct 6, 2011 (gmt 0)

There was a long discussion about this a few months ago. See [webmasterworld.com ]

buckworks

5:01 pm on Oct 6, 2011 (gmt 0)

these pages are still no good and they are part of the bad user experience.

There can be many legitimate reasons for not wanting a particular URL to turn up as a search landing page, so it would be very short-sighted for anyone including the algo to assume that noindex was the sign of a bad page.

walkman

5:49 pm on Oct 6, 2011 (gmt 0)

Whitey, what is the question or suggestion, that no-indexing pages is the same as leaving them how they are?

pageoneresults

6:21 pm on Oct 6, 2011 (gmt 0)

There can be many legitimate reasons for not wanting a particular URL to turn up as a search landing page, so it would be very short-sighted for anyone including the algo to assume that noindex was the sign of a bad page.

Agreed! There are documents that don't need to be within the "flow of equity" hence noindex. Crawl and Index Sculpting. ;)

[edited by: pageoneresults at 6:22 pm (utc) on Oct 6, 2011]

levo

6:22 pm on Oct 6, 2011 (gmt 0)

Using WMT to remove those urls keep them away from SERPs. And coincidentally, removal requests are no longer expired, you just have to add folders/files to the robots.txt and make sure to add a removal request through WMT..

Whitey

9:37 pm on Oct 6, 2011 (gmt 0)

Whitey, what is the question or suggestion, that no-indexing pages is the same as leaving them how they are?

I'm questioning Tedster's remarks, observing the time lapse between the referenced thread above and have not heard of anyone coming out of Panda with this procedure alone.

There must be sites with residual good content / usability etc that applied this procedure to no avail. So why are they not out. My suspicion is that this procedure needs to be followed by other criteria in it's specific implementation - and it's not clear to me what it is.

So either the views are incomplete in the specifics, or I'm missing something ( often happens ).

tedster

10:06 pm on Oct 6, 2011 (gmt 0)

The original noindex comment, if I remember correctly, did not come directly from Google. Here's one quote I did find from Google, right after Panda launched:

In addition, it's important for webmasters to know that low quality content on part of a site can impact a site's ranking as a whole. For this reason, if you believe you've been impacted by this change you should evaluate all the content on your site and do your best to improve the overall quality of the pages on your domain. Removing low quality pages or moving them to a different domain could help your rankings for the higher quality content.

[searchengineland.com...]
The idea seemed to be get rid of the low quality pages, not just keep them out of the public index. And remember, noindex pages are definitely crawled and stored on Google's back end. They can't even read the noindex meta tag if they don't crawl the page.

I still feel like there could have been a suggestion about noindex from a Googler - maybe in their forums or something like that. But it's really like triage, like a temporary step preceding an actual upgrade or removal.

And it seems clear now, seven months later, that noindex didn't work for anybody on its own.

dickbaker

10:13 pm on Oct 6, 2011 (gmt 0)

And it seems clear now, seven months later, that noindex didn't work for anybody on its own.

Two-thirds of the pages on my site are now no-indexed, but the other pages haven't moved up. Any that did move did so for very long tail phrases.

lucy24

10:22 pm on Oct 6, 2011 (gmt 0)

They can't even read the noindex meta tag if they don't crawl the page.

They have to go to the page in order read the <meta>, but do they have to read the entire page? Not just the rest of the <head>, but also the whole <body>?

That was a technological question, not a moral one. Is the googlebot so designed that it has to read all or nothing, without a "stop right here" option?

Whitey

10:23 pm on Oct 6, 2011 (gmt 0)

If almost an entire site is noindexed it is still part of the user experience. In Google's eyes is it not best to get rid of them completely.

The othe question i have is about PR leakage, if you have pages letting that flow through the site, what's that doing to the residual site and the message it's sending to Google.

After all , you wouldn't link out to a whole bunch of no index pages on someone else's site. The suspicion would be that the site was poorly maintained or linked to poor quality content etc. Somthings not right here in how folks are administering this en-masse especially now that Panda's part of the equation.

Whitey

11:28 pm on Oct 6, 2011 (gmt 0)

... and don't forget Google needs to strenghten it's understanding of intent. Match that with a poor user experience that follows a query and what have you got?

tedster

12:57 am on Oct 7, 2011 (gmt 0)

They have to go to the page in order read the <meta>, but do they have to read the entire page? Not just the rest of the <head>, but also the whole <body>?

That was a technological question, not a moral one. Is the googlebot so designed that it has to read all or nothing, without a "stop right here" option?

Googlebot is a group of programs that request the data - the analysis is done back at the Google server farm. So googlebot requests a URL and their server caches whatever your server replies, and then analyzes it.

So I assume that the answer is "yes", Google's back end gets the whole of whatever your server sends.

Whitey

5:40 am on Oct 7, 2011 (gmt 0)

Removing low quality pages or moving them to a different domain

.... so are we talking getting rid of them completely ?

We know isolating good content onto sub domains works, yet noindex doesn't. The association of good and bad pages concerns me on the same domain ( or even linked domains if widespread ).

If it was a handful - no problems, but some folks are hacking huge quantities of pages on their sites to the bone and seeing nothing.

Play_Bach

9:20 am on Oct 7, 2011 (gmt 0)

I'm no indexing a few thousand old pages because the content has been moved. Once Google and MSN stops showing those pages in their returns, I plan to delete them. Somewhere I got the idea that this was a better way to go than just 404.

tangor

9:49 am on Oct 7, 2011 (gmt 0)

404 never hurt me. Don't mind the log entries (half the time the 410 is ignored) and life goes on.

I have no contract with the SEs to provide content. I also have no contract to advise them when I remove content. Tit for Tat. :)

Play_Bach

10:16 am on Oct 7, 2011 (gmt 0)

> 404 never hurt me.

From what I've seen, 404 pages hang around in the SERPS - sometimes for years. Going noindex does seem to work, though it too can take several months before Google and MSN show it.

tangor

11:31 am on Oct 7, 2011 (gmt 0)

Note the "410" spec... ends up 404 and there ain't nothing I can do about it. Life is too short to worry about no displays, or that other stuff (besides, I noarchive, nocache, too, not that that makes any difference).

Zivush

12:33 pm on Oct 7, 2011 (gmt 0)

Are you sure that it may help with Panda? I am not.
But if you are speculate having specific problems with some sections of a site being indexed you don't want indexed so get rid - delete and copy elsewhere.

The logic: If I were dumb enough to add pages that I wanted to keep secret from Google, not much of a secret..+might downgrade the site.

suggy

12:44 pm on Oct 7, 2011 (gmt 0)

Whether you noindex or do url removal and ban or physically delete the damn pages, you're still only addressing one side of the coin.

A vomit sandwich is not made more palatable by removing the bread!

pageoneresults

1:09 pm on Oct 7, 2011 (gmt 0)

Can I use the NOINDEX META for excluding low-value pages from Google?
Mar 10, 2011 - [Google.com...]

John Mu - It sounds like you're heading in a good direction :-). Regarding the 404 vs noindex, my take would be: Completely remove all pages that you absolutely don't want anymore. Let them return 404 (and make a great 404 page so that your users can get to where they were headed, or find something related). See [google.com...] Yes, those pages will show up as crawl errors in Webmaster Tools, but that's fine -- they're supposed to. They won't negatively affect the rest of your site's crawling, indexing or ranking. Having pages that return 404 is fine and to be expected. Using a 410 ("Gone") HTTP result code may be a tiny bit faster, but overall you don't have to worry about the difference, a 404 is ok.

Who said that using noindex was a Panda Recovery method?

They have to go to the page in order read the <meta>, but do they have to read the entire page? Not just the rest of the <head>, but also the whole <body>?

If I'm not mistaken, I remember Matt Cutts stating in a video that Google will grab the entire noindex page, not just the <head>. It's easier for them to grab all as opposed to just making a <head> request.

rlange

5:01 pm on Oct 7, 2011 (gmt 0)

pageoneresults wrote:
It's easier for them to grab all as opposed to just making a <head> request.

Probably because there's no such thing as a "<head> request". ;o)

--
Ryan

netmeg

5:14 pm on Oct 7, 2011 (gmt 0)

A vomit sandwich is not made more palatable by removing the bread!

If I had business cards, I would add this to my business card.

pageoneresults

5:33 pm on Oct 7, 2011 (gmt 0)

Probably because there's no such thing as a "<head> request". ;o)

Googlebot Head Request - Anyone seen Googlebot performing Head Requests?
[WebmasterWorld.com...]

Hypertext Transfer Protocol - HTTP/1.1 - 9.4 HEAD
[W3.org...]

<added>Oh boy, did I misinterpret all of that or what? Embarrassed...

[edited by: pageoneresults at 6:25 pm (utc) on Oct 7, 2011]

rlange

6:09 pm on Oct 7, 2011 (gmt 0)

@pageoneresults: That tells the server to only send the response headers, not the response headers plus content. Response headers are not the same as the HTML document <head>, so it's useless for reading meta tags.

HEAD != <head></head>

There's no way to tell a server to respond only with the content contained within a specified HTML element.

--
Ryan

aristotle

6:13 pm on Oct 7, 2011 (gmt 0)

If you design your website carefully at the beginning, you shouldn't ever need to do a lot of no-indexing, deleting, or re-directing. I've never deleted or re-directed a single page on any of my sites, and have only used the noindex tag in a few special cases.

I think that the Google algorithm may not put as much trust in sites that have large numbers of re-directs, deleted pages, and no-indexed pages.

pageoneresults

6:50 pm on Oct 7, 2011 (gmt 0)

@pageoneresults: That tells the server to only send the response headers, not the response headers plus content. Response headers are not the same as the HTML document <head>, so it's useless for reading meta tags.

What the hell was I thinking? I knew that, I really did. Now I have carpet burns on me knees. :|

Hey, I can claim Senior Moments these days.

On a side note, we use X-Robots-Tag for noarchive, noindex, and nofollow directives. I just totally had a blank moment there. Our noindex is served in a HEAD request, and also the HTML <head>. We do that as a backup in case the document is served elsewhere and the <head> is left intact.

I'm going to bow out with me tail between my legs and stay away from these discussions in the future. ;)

HEAD != <head></head>

I'm tellin' ya, I knew that! o_O

walkman

7:08 pm on Oct 7, 2011 (gmt 0)

Google is looking at data.
For most of the web they cannot analyze text, other than what keywords it mentions and if it's 100% or 88% identical to other sites. Not every site has articles mind you, some sell shovels and in 2 sentences you can describe them.
So deleting, changing or no-indexing is not going to matter unless it effects the core data that Google says a 'good site' has.

What do good, credible sites do? That also leaves patterns that can be very easily measured. In money niches, you may get around if you are very popular, or sneak in for a while but it isn't going to last for long.

mslina2002

9:33 pm on Oct 7, 2011 (gmt 0)

How can this be a good user experience in Google's Panda eyes. Linking out to no indexed pages or indeed having them on your site has got to signal, these pages are still no good and they are part of the bad user experience.

Not all noindex pages provide poor user experience. Some pages visitors might find valuable but you have noindexed for one reason or other (perhaps they are duplicated all over the net but still contain info that users find valuable). If you see that your noindex pages are providing a poor experience to your users (bounce rate increases once they reach these pages, hitting the back button, etc.) I would say to get rid of those pages instead of noindex.

So deleting, changing or no-indexing is not going to matter unless it effects the core data that Google says a 'good site' has.

I agree. If you noindexed many pages of your site and have content left that is moderate at best, I suggest you should add more quality content to it. If you are deleting tons of pages, add back the quality content pages. Read post by by SocietyRoyalle who was Pandalized in 1.0 but recovered and untouched in 2.5
[webmasterworld.com...] (post #:4369457)and (post #:4369615)

I too was a victim of Panda 1.0 and recovered in July. Panda 2.5 left me untouched. I had also deindexed 80% of my site that were all datafeed pages that googlebot indexed unbeknown to me. Most pages were blank. Others were thin content. What was left after de-indexing was all original content, pictures, videos, by me. Though I must say from July to now I have not added much content to the site. I am getting lots of natural backlinks due to the content and social bookmarking places like pinterest. A little more details here about my noindexing [webmasterworld.com...]

Perhaps there are others with similar experience but are not posting. People tend to go online and post when there are problems. Look at how many new forum members show up with every Panda iteration.

Whitey

10:00 pm on Oct 7, 2011 (gmt 0)

mslina2002 - great insight - thanks & glad you made it out

I still have a problem with the concept of retaining no index pages that are good for users, but bad for Google where large amounts of data are involved. The trust remark above resonates with me - so i'm not entirely comfortable.

How should one link to those no index pages ? Do you still send PR through to them or adjust the link code to preserve the juice or remove them completely or apply some other option?

This 36 message thread spans 2 pages: 36