Forum Moderators: Robert Charlton & goodroi
Downside to noindex?
I want them deindexed and claim back the link juice and crawl allowance.
Crawl allowance - Just a bit
In order to even see the noindex meta tag, googlebot must crawl the page. It may crawl less frequently after it verifies the noindex a few times, but it must continue to crawl.
But noindex pages are a different case.Google need to crawl them to see the noindex meta tag.
For those pages that return 404 or 410, I don't see google crawling them as they don't exist anymore.So, they too wouldn't remain on the to-do list.
I am seeing some URLs that return 404 being requested for a very long time
For those pages that return 404 or 410, I don't see google crawling them as they don't exist anymore.
I hate using these tags, just as I hated using nofollow way back when. But sometimes, it seems necessary. Is their any downside to what I've done with the noindex, follow? Is Google likely to give a crap that I've just told it, that it's not to index half a million pages?
Fwiw, in my experience, the use of the NOINDEX META tag doesn't actually stop the pages being indexed, they just won't show in the first set of results shown to a searcher. They will however often appear in the "repeat the search with the omitted results included" results.
Your search - site:example.com/noindexed-document - did not match any documents.
Suggestions:
Make sure all words are spelled correctly.
Try different keywords.
Try more general keywords.
Fwiw, in my experience, the use of the NOINDEX META tag doesn't actually stop the pages being indexed, they just won't show in the first set of results shown to a searcher. They will however often appear in the "repeat the search with the omitted results included" results.
I must admit I had previously thought that robots.txt stopped Google crawling the 'disallowed' pages.
I must admit I had previously thought that robots.txt stopped Google crawling the 'disallowed' pages.
Also, there is only one right answer for this. :)
When you disallow in robots.txt Google DOES NOT (contrary to popular belief) crawl the pages, which means they do not know what is on the page, or whether the page contains a noindex directive or not, so they use external information
If robots.txt stopped Google crawling the page how come it is in Google's index with a proper page title? Thats the bit I don't follow.