homepage Welcome to WebmasterWorld Guest from 23.23.22.200
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Visit PubCon.com
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
Disallow vs noindex for duplicate content
Disallow in robots.txt, or noindex in meta tag for duplicate content
graeme_p




msg:4421881
 7:52 am on Feb 26, 2012 (gmt 0)

I am considering adding pages to encourage users to navigate around a site, that will largely duplicate existing content (snippets already on other pages, etc.).

I think they should not be indexed because they are duplicate content.

I could disallow them in robots.txt, but I can see some possible advantages for the meta tag:

1) as far as I know, adding a noindex meta tag without nofollow will allow page rank to circulate through that page.
2) These pages will link to pages that have something in common, so it does contain useful information on the theses and topics a page covers, and how pages are related
3) It contains good link text.

Search engines must download and parse a page with a noindex tag, and I think they follow links unless there is a nofollow tag as well. Do they also use the other information on the page, and just exclude it from the SERPS, or do they forget it completely? Will any of the above work on a page with a noindex tag? Is a noindex tag sufficient to avoid duplicate content issues.

 

g1smd




msg:4421896
 8:48 am on Feb 26, 2012 (gmt 0)

robots.txt disallow stops the page being crawled, and the page may appear as a URL-only entry in the SERPs.

meta robots noindex allows the page to be crawled, links followed, etc, but there will be no mention of the page in the SERPs.

Is the rel="canonical" attribute of any use to you here?

graeme_p




msg:4422199
 10:44 am on Feb 27, 2012 (gmt 0)

rel=canonical is not a solution, because the duplicate content content on any one page will be from more than one other page.

I understand the difference in the immediate effect of disallow and noindex (not crawled at all, vs crawled), the question is what other effects they have (Page rank flow, search engines ability to understand topics of pages).

Evercome Shathees




msg:4422584
 11:13 am on Feb 28, 2012 (gmt 0)

Adding robots.txt on websites will stop search engines being crawled webpages. But itpages will be indexed Through back links. If you wish to stop index and crawl better to add metaindex= noindex, nofollow on every web pages. adding Link=relnofollow will be useful to stop passing link juice to web pages.

dominoman




msg:4424770
 7:29 pm on Mar 4, 2012 (gmt 0)

Assuming the navigation pages have some worthwhile content, and that it is snippets rather than full duplicate I would allow indexing and leave the links as follow.

It doesnt make sense to allow follow but then bar indexing. Bear in mind that removing nofollow will "leak" some PR juice to those new navigation pages, so if you're doing that and no indexing you are losing that juice.

graeme_p




msg:4425470
 9:07 am on Mar 6, 2012 (gmt 0)

@dmoinoman, I will not entirely lose the juice because the meta tag on the index pages allows follow, so link juice will flow through them.

OTOH if I blocked them entirely (with robots.txt or nofollow tags) then I would lose the link juice from links to the index pages entirely (if Matt Cutts is to be believed....).

I suspect they are too close to being duplicates for indexing to be a good idea. There is lots of overlap and almost duplicates.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved