Forum Moderators: Robert Charlton & goodroi
Tedster - I've been dubious about the "noindex" advice that was being pushed out, even by some Googlers. Somehow it just doesn't make solid sense to me - after all, the pages are still "there", just not being used as search landing pages.
There can be many legitimate reasons for not wanting a particular URL to turn up as a search landing page, so it would be very short-sighted for anyone including the algo to assume that noindex was the sign of a bad page.
[edited by: pageoneresults at 6:22 pm (utc) on Oct 6, 2011]
Whitey, what is the question or suggestion, that no-indexing pages is the same as leaving them how they are?
In addition, it's important for webmasters to know that low quality content on part of a site can impact a site's ranking as a whole. For this reason, if you believe you've been impacted by this change you should evaluate all the content on your site and do your best to improve the overall quality of the pages on your domain. Removing low quality pages or moving them to a different domain could help your rankings for the higher quality content.
[searchengineland.com...]
The idea seemed to be get rid of the low quality pages, not just keep them out of the public index. And remember, noindex pages are definitely crawled and stored on Google's back end. They can't even read the noindex meta tag if they don't crawl the page.
I still feel like there could have been a suggestion about noindex from a Googler - maybe in their forums or something like that. But it's really like triage, like a temporary step preceding an actual upgrade or removal.
And it seems clear now, seven months later, that noindex didn't work for anybody on its own.
They can't even read the noindex meta tag if they don't crawl the page.
They have to go to the page in order read the <meta>, but do they have to read the entire page? Not just the rest of the <head>, but also the whole <body>?
That was a technological question, not a moral one. Is the googlebot so designed that it has to read all or nothing, without a "stop right here" option?
Removing low quality pages or moving them to a different domain
John Mu - It sounds like you're heading in a good direction :-). Regarding the 404 vs noindex, my take would be: Completely remove all pages that you absolutely don't want anymore. Let them return 404 (and make a great 404 page so that your users can get to where they were headed, or find something related). See [google.com...] Yes, those pages will show up as crawl errors in Webmaster Tools, but that's fine -- they're supposed to. They won't negatively affect the rest of your site's crawling, indexing or ranking. Having pages that return 404 is fine and to be expected. Using a 410 ("Gone") HTTP result code may be a tiny bit faster, but overall you don't have to worry about the difference, a 404 is ok.
They have to go to the page in order read the <meta>, but do they have to read the entire page? Not just the rest of the <head>, but also the whole <body>?
Probably because there's no such thing as a "<head> request". ;o)
[edited by: pageoneresults at 6:25 pm (utc) on Oct 7, 2011]
@pageoneresults: That tells the server to only send the response headers, not the response headers plus content. Response headers are not the same as the HTML document <head>, so it's useless for reading meta tags.
HEAD != <head></head>
How can this be a good user experience in Google's Panda eyes. Linking out to no indexed pages or indeed having them on your site has got to signal, these pages are still no good and they are part of the bad user experience.
So deleting, changing or no-indexing is not going to matter unless it effects the core data that Google says a 'good site' has.