|Disallow vs noindex for duplicate content|
Disallow in robots.txt, or noindex in meta tag for duplicate content
I am considering adding pages to encourage users to navigate around a site, that will largely duplicate existing content (snippets already on other pages, etc.).
I think they should not be indexed because they are duplicate content.
I could disallow them in robots.txt, but I can see some possible advantages for the meta tag:
1) as far as I know, adding a noindex meta tag without nofollow will allow page rank to circulate through that page.
2) These pages will link to pages that have something in common, so it does contain useful information on the theses and topics a page covers, and how pages are related
3) It contains good link text.
Search engines must download and parse a page with a noindex tag, and I think they follow links unless there is a nofollow tag as well. Do they also use the other information on the page, and just exclude it from the SERPS, or do they forget it completely? Will any of the above work on a page with a noindex tag? Is a noindex tag sufficient to avoid duplicate content issues.
robots.txt disallow stops the page being crawled, and the page may appear as a URL-only entry in the SERPs.
meta robots noindex allows the page to be crawled, links followed, etc, but there will be no mention of the page in the SERPs.
Is the rel="canonical" attribute of any use to you here?
rel=canonical is not a solution, because the duplicate content content on any one page will be from more than one other page.
I understand the difference in the immediate effect of disallow and noindex (not crawled at all, vs crawled), the question is what other effects they have (Page rank flow, search engines ability to understand topics of pages).
Adding robots.txt on websites will stop search engines being crawled webpages. But itpages will be indexed Through back links. If you wish to stop index and crawl better to add metaindex= noindex, nofollow on every web pages. adding Link=relnofollow will be useful to stop passing link juice to web pages.
Assuming the navigation pages have some worthwhile content, and that it is snippets rather than full duplicate I would allow indexing and leave the links as follow.
It doesnt make sense to allow follow but then bar indexing. Bear in mind that removing nofollow will "leak" some PR juice to those new navigation pages, so if you're doing that and no indexing you are losing that juice.
@dmoinoman, I will not entirely lose the juice because the meta tag on the index pages allows follow, so link juice will flow through them.
OTOH if I blocked them entirely (with robots.txt or nofollow tags) then I would lose the link juice from links to the index pages entirely (if Matt Cutts is to be believed....).
I suspect they are too close to being duplicates for indexing to be a good idea. There is lots of overlap and almost duplicates.