Disallow vs noindex for duplicate content - Sitemaps, Meta Data, and robots.txt forum at WebmasterWorld - WebmasterWorld

Forum Moderators: goodroi

Message Too Old, No Replies

Disallow vs noindex for duplicate content

Disallow in robots.txt, or noindex in meta tag for duplicate content

graeme_p

7:52 am on Feb 26, 2012 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

I am considering adding pages to encourage users to navigate around a site, that will largely duplicate existing content (snippets already on other pages, etc.).

I think they should not be indexed because they are duplicate content.

I could disallow them in robots.txt, but I can see some possible advantages for the meta tag:

1) as far as I know, adding a noindex meta tag without nofollow will allow page rank to circulate through that page.
2) These pages will link to pages that have something in common, so it does contain useful information on the theses and topics a page covers, and how pages are related
3) It contains good link text.

Search engines must download and parse a page with a noindex tag, and I think they follow links unless there is a nofollow tag as well. Do they also use the other information on the page, and just exclude it from the SERPS, or do they forget it completely? Will any of the above work on a page with a noindex tag? Is a noindex tag sufficient to avoid duplicate content issues.

g1smd

8:48 am on Feb 26, 2012 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

robots.txt disallow stops the page being crawled, and the page may appear as a URL-only entry in the SERPs.

meta robots noindex allows the page to be crawled, links followed, etc, but there will be no mention of the page in the SERPs.

Is the rel="canonical" attribute of any use to you here?

graeme_p

10:44 am on Feb 27, 2012 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

rel=canonical is not a solution, because the duplicate content content on any one page will be from more than one other page.

I understand the difference in the immediate effect of disallow and noindex (not crawled at all, vs crawled), the question is what other effects they have (Page rank flow, search engines ability to understand topics of pages).

Evercome Shathees

11:13 am on Feb 28, 2012 (gmt 0)

10+ Year Member

Adding robots.txt on websites will stop search engines being crawled webpages. But itpages will be indexed Through back links. If you wish to stop index and crawl better to add metaindex= noindex, nofollow on every web pages. adding Link=relnofollow will be useful to stop passing link juice to web pages.

dominoman

7:29 pm on Mar 4, 2012 (gmt 0)

10+ Year Member

Assuming the navigation pages have some worthwhile content, and that it is snippets rather than full duplicate I would allow indexing and leave the links as follow.

It doesnt make sense to allow follow but then bar indexing. Bear in mind that removing nofollow will "leak" some PR juice to those new navigation pages, so if you're doing that and no indexing you are losing that juice.

graeme_p

9:07 am on Mar 6, 2012 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

@dmoinoman, I will not entirely lose the juice because the meta tag on the index pages allows follow, so link juice will flow through them.

OTOH if I blocked them entirely (with robots.txt or nofollow tags) then I would lose the link juice from links to the index pages entirely (if Matt Cutts is to be believed....).

I suspect they are too close to being duplicates for indexing to be a good idea. There is lots of overlap and almost duplicates.