Page is a not externally linkable
- Google
-- Google SEO News and Discussion
---- Duplicate Content - Get it right or perish


Halfdeck - 3:48 am on Aug 27, 2006 (gmt 0)


Early this year, I created a few pages under one directory to test how Google reacts to duplicate content.

I created one original page with around 300+ words of text, and then various copies of that page, some copies sharing 60% of the content of the original, others 90%+. PR distribution for all pages are identical - one inlink from domain root to each page, and no outgoing links.

Initially (pre-BD), all pages found their way into the main index (including a page 100% identical to the original). A month or two after Big Daddy roll out, all the pages in that directory vanished from the index. Surprising to me, since I expected at least the original copy and pages with less than 70% similarity to stay in the main index and to have other pages either drop or turn supplemental.

It looks to me like Google "banned" the directory and refuses to index anything inside it. It could be due to lack of trust/PR, except Google has the rest of the domain in its main index.

I assume Googlebot prefers to crawl/index trusted, valuable, frequently-updated sites first. If it finds 100 near-duplicates under one directory, and knows there are still 100,000 in that directory left to crawl, it would be more efficient to skip that directory, instead of spending time actually crawling it knowing none of it is worth keeping in the main index.


Thread source:: http://www.webmasterworld.com/google/3060898.htm
Brought to you by WebmasterWorld: http://www.webmasterworld.com