homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

Disallow vs noindex for duplicate content
Disallow in robots.txt, or noindex in meta tag for duplicate content

WebmasterWorld Senior Member 5+ Year Member

Msg#: 4421879 posted 7:52 am on Feb 26, 2012 (gmt 0)

I am considering adding pages to encourage users to navigate around a site, that will largely duplicate existing content (snippets already on other pages, etc.).

I think they should not be indexed because they are duplicate content.

I could disallow them in robots.txt, but I can see some possible advantages for the meta tag:

1) as far as I know, adding a noindex meta tag without nofollow will allow page rank to circulate through that page.
2) These pages will link to pages that have something in common, so it does contain useful information on the theses and topics a page covers, and how pages are related
3) It contains good link text.

Search engines must download and parse a page with a noindex tag, and I think they follow links unless there is a nofollow tag as well. Do they also use the other information on the page, and just exclude it from the SERPS, or do they forget it completely? Will any of the above work on a page with a noindex tag? Is a noindex tag sufficient to avoid duplicate content issues.



WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member

Msg#: 4421879 posted 8:48 am on Feb 26, 2012 (gmt 0)

robots.txt disallow stops the page being crawled, and the page may appear as a URL-only entry in the SERPs.

meta robots noindex allows the page to be crawled, links followed, etc, but there will be no mention of the page in the SERPs.

Is the rel="canonical" attribute of any use to you here?


WebmasterWorld Senior Member 5+ Year Member

Msg#: 4421879 posted 10:44 am on Feb 27, 2012 (gmt 0)

rel=canonical is not a solution, because the duplicate content content on any one page will be from more than one other page.

I understand the difference in the immediate effect of disallow and noindex (not crawled at all, vs crawled), the question is what other effects they have (Page rank flow, search engines ability to understand topics of pages).

Evercome Shathees

Msg#: 4421879 posted 11:13 am on Feb 28, 2012 (gmt 0)

Adding robots.txt on websites will stop search engines being crawled webpages. But itpages will be indexed Through back links. If you wish to stop index and crawl better to add metaindex= noindex, nofollow on every web pages. adding Link=relnofollow will be useful to stop passing link juice to web pages.


Msg#: 4421879 posted 7:29 pm on Mar 4, 2012 (gmt 0)

Assuming the navigation pages have some worthwhile content, and that it is snippets rather than full duplicate I would allow indexing and leave the links as follow.

It doesnt make sense to allow follow but then bar indexing. Bear in mind that removing nofollow will "leak" some PR juice to those new navigation pages, so if you're doing that and no indexing you are losing that juice.


WebmasterWorld Senior Member 5+ Year Member

Msg#: 4421879 posted 9:07 am on Mar 6, 2012 (gmt 0)

@dmoinoman, I will not entirely lose the juice because the meta tag on the index pages allows follow, so link juice will flow through them.

OTOH if I blocked them entirely (with robots.txt or nofollow tags) then I would lose the link juice from links to the index pages entirely (if Matt Cutts is to be believed....).

I suspect they are too close to being duplicates for indexing to be a good idea. There is lots of overlap and almost duplicates.

Global Options:
 top home search open messages active posts  

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved