Does Google allow more cross-domain duplicate content in some markets?

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Does Google allow more cross-domain duplicate content in some markets?

lfgoal

2:36 pm on Sep 28, 2006 (gmt 0)

I have a question about duplicate content. I've heard (or read rather) a few individuals say that google is good at catching duplicate content. I think I've also come across Matt Cutt's statements on this issue. And in the area of duplicate content running on articles sites, this seems to be somewhat correct.

However, I've noticed in some niches (real estate and car reviews) that the top results are often page after page of duplicated content. Is Google not able to spot this and flag it as duplicate?

Or is the issue not even really duplicate content but rather the fact that for those search queries there aren't other more competitive pages to take the place of the duplicates in the search results....

which may be another way of wondering if google really does much at all about duplicate content---as I said some niches are full of duplicate pages in the first ten spots for certain phrases:

why can't google catch this, if it can at all?

tedster

9:19 pm on Sep 28, 2006 (gmt 0)

I've certainly noticed this as well -- in various niches and not others, as you say. I don't know whether Google "can't" catch it (seems unlikely to me) or has chosen not to (seems more likely). Some highly competitive areas are prone to duplication for many reasons, and heavy duplicate filtering might not serve the end user as well.

It sometimes feels like Google takes off the restrictions in some markets and just allows those domains to have a no-hold-barred steel cage death match. I've heard this opinion expressed frequently, but never officially or with any real authority.

g1smd

9:38 pm on Sep 28, 2006 (gmt 0)

There are many types of "duplicate content" and Google does different things with them.

These are my names for them, and I think they express what they are quite well.

- "exact duplicates" - these are www vs. non-www, multiple domains, similar parameters, capitalisation issues, http vs. https, etc, and some get delisted, some turn URL-only, and some show as Supplemental.

- "pseudo duplicates" - these are where many pages on the same site have the same title tag and/or same meta description as other pages (even though the page content itself might be very different) and for these most get hidden behind a "click for omitted results" link, and some might get dumped to Supplemental.

- "syndicated content" and "site scrapers" - these are where the domains are owned by other people and the content is not an exact byte-for-byte copy. The site navigation may well be different, and the page HTML code is likely to be different too. For these, Google might list quite a few before deciding to filter some out. You see this with press releases and newswire stuff. It's interesting to see what sinks and what swims. In some areas I don't think they apply a heavy enough filter for these - as you have noticed.