Matt Cutts on duplicate

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Matt Cutts on duplicate

ownerrim

9:57 pm on Mar 1, 2006 (gmt 0)

Found this in an article, but not entirely sure what Cutt's point was. Anyone have a clue?

"Honest site owners often worry about duplicate content when they don't really have to," Google's Cutts said. "There are also people that are a little less conscientious." He also noted that different top level domains, like x.com, x.ca, are not a concern."

tqatsju

6:04 am on Mar 6, 2006 (gmt 0)

not sure if this statement is completely true because on the site i am watching we have at least 200 pages that have completely unique content on them, whereas the other pages are product pages (the other 10,000 or whatever). So what you guys are saying is for those 200 totally unique pages what would I have to do to get them included in this new update?

Vadim

7:00 am on Mar 6, 2006 (gmt 0)

The argument that a duplicate-content filter will "eliminate whole industries" simply isn't true.

Agree. Besides such kind of sites probably will never be eliminated from the local search results because people need the information about the nearby stores with the nails.

So the informational may be duplicate on the global level and unique on the local level.

It would be interesting to compare the SERP for the sites that sell common type of widgets for local and global searches. Might be Google has already implemented it.

Vadim.

send2paul

7:50 am on Mar 6, 2006 (gmt 0)

Websites with duplicate content? Don't we fall over them every day?

Banner text, side columns and footer text for those websites using template construction have duplicate content running right through them.

But because the main content of each page has it's keywords optimised with the meta tags, <h1> tags, keyword density, image <alt> tags etc etc for that particular page - then the page ranks well for the optimised content.

So, if it's duplicate content that isn't optimised, then it's not going to appear high in the SERPS anyway - right?

anallawalla

4:49 am on Mar 8, 2006 (gmt 0)

Honest site owners often worry about duplicate content when they don't really have to,... He also noted that different top level domains, like x.com, x.ca, are not a concern.

A European multinational company I know has unknowingly built near copies of their site in about a dozen ccTLDs. About 5% of each site is unique. Only the .com shows a reasonable number of indexed pages with the site: command. A couple show 2-3 results; the rest including the head office show zero results and are whitebarred. I'd say that this "honest" un-SEOd company needs to worry big time.

simey

8:06 am on Mar 8, 2006 (gmt 0)

The ultimate SE spam is coming. "original content spam". That will be a tough one for the SE algorythms to fight.

percentages

8:19 am on Mar 8, 2006 (gmt 0)

"We" talk about duplicate content time and time again.

How many hours are spent on this subject?

If you want to know the answer why not simply try it out?

Create a few domains, post some duplicate content, set up identical links to those pages and see what happens!

It isn't rocket science, it will cost you the price of a meal, and a few hours of time.......then you will know the definitive answer!

Alternatively you could simply do a few hundred searches for "odd ball things" are you will come to the same conclusion.

hermosa

2:51 pm on Mar 10, 2006 (gmt 0)

Some sites are STILL linking into my old site despite the fact that I have written to them repeatedly with the change in URL. I don't want to completely lose out on referred traffic from those sites. Will placing a robots.txt file at my old site to keep robots out avoid a duplicate content penalty.

TomWaits

3:00 pm on Mar 10, 2006 (gmt 0)

"He also noted that different top level domains, like x.com, x.ca, are not a concern."

I've watched competitors make grave errors using different top level domains where they shared some content.

Different top level domains are most certainly a concern.

dodger

7:23 pm on Mar 10, 2006 (gmt 0)

<<Different top level domains are most certainly a concern>>

I have a competitor that has 4 mirrors living happily in Google despite a number of spam reports.

.com.au
.net.au
.com
.net

and one other all indexed in Google

In one search they appear 4 in a row taking up the first 4 places.

Whitey

5:31 am on Mar 25, 2006 (gmt 0)

Matt Cutts at [url] [seroundtable.com...] said :

Different top level domains: if you own a .com and a.fr, for example, don�t worry about dupe content in this case

Don�t know why he says that as it goes in direct contrast to what is posted on the Google guidelines (I have bolded the most relevant sentence)

�While all sites in our index return for searches restricted to "the web," we draw on a relevant subset of sites for each country restrict. Our crawlers may identify the country for a site by factors such as the physical location at which the site is hosted, the site's IP address, the WHOIS information for a domain, and its top-level domain.

That said, your site's top-level domain doesn't need to match the country domain for which you'd like it to return. It's also important to keep in mind that our crawlers don't index duplicate content, so creating identical sites at several domains will likely not result in their returning for many country restricts. If you do create duplicate domains, we suggest using a robots.txt file to block our crawler from accessing all but your preferred one.�

[google.com...]

Can anyone explain?

Vadim

6:09 am on Mar 25, 2006 (gmt 0)

Simply complex :)

1.IP and location do matter but there are other factors also (for example the language and links) and they may overweight.

2.There are no punishment for the duplicate content but only content of one site will survive. If the sites in both countries have both languages, it is better to restrict some pages with robots.txt

3.etc. (read this forum long time:)

Vadim.

Miop

12:24 pm on Mar 25, 2006 (gmt 0)

Is there a new email addy for reporting a site with supplementals?
I seem to have a load of new ones. The cached data is from 2004 even though the pages are current. :(

activeco

1:06 pm on Mar 25, 2006 (gmt 0)

Don�t know why he says that as it goes in direct contrast to what is posted on the Google guidelines (I have bolded the most relevant sentence)

Obviously they adopted a policy where e.g. an international business would be indexed for both or more (near) duplicate contents in different domains.
For example if an user from UK does the search, the company's .co.uk would have preference before .com version. I see it as pretty fair.
It is unclear however (and I think this is the case), if the second level domain must be the same for such policy to work.

Anyway, I have never seen a big problem with duplicate content issue.
As webdoctor put it in the message #2, there is no "penalty". If you have two or more (nearly) identical pages of YOUR content, only one is being indexed.
Nothing wrong with that.

However, if you use templates, affiliate products, articles from other sites and such, it is a natural tendency of a well designed search engine to try to find the original owner of the content and give him the preference/credits.
Nothing wrong with this either.

RichTC

2:07 pm on Mar 25, 2006 (gmt 0)

Having read this thread from the OP as i see it google is happy to stuff any site it likes, removing whatever pages it likes if it suspects they may be in some way duplicated.

Its not doing this for the search users benefit because it could just simply display in the serps the most relevent page of a site every time applicable to that search query - end of problem.

It does it imo because it wants to stop webmasters from ranking for multi variations of similar keywords so that they have to buy adwords its that simple.

For example if your site is an authority on "Blue widgets" you would think it would rank for "Your blue widgets", "my blue widgets", "everyones blue widget" "Blue widgeters" etc, etc, etc. It you purchased adwords for that keyword it would feature on all the similar keyword searches.

Fact is that unless your blue widget site has a page page optimised for "everyones blue widget" it is unlikely to rank anywhere for it - the moment you add a page thats similar but optimised to the adjusted keyword its a matter of time untill google treats it as duplicate.

All in all, i think google is now spending far to much effort trying to work out ways to prevent webmasters sites ranking for multi keywords in its quest to increase earnings.

As i posted, all google needs to do is deliver in its serps "The most relevent page" on a site applicable to that search term. so called duplicate content on a site where honest webmasters get stuffed by google would then not be an issue.

dataguy

2:18 pm on Mar 25, 2006 (gmt 0)

Don�t know why he says that as it goes in direct contrast to what is posted on the Google guidelines

When Matt says:

Different top level domains: if you own a .com and a.fr, for example, don�t worry about dupe content in this case

He is not saying that all pages from all domains are going to get equal ranking. He's saying that you don't have to worry about being penalized for the same content on multiple domains. What Google does is try to show the best URL in their listings, once. It is not a penalty to have the other copies on other domains appear much lower.

This is the real definition of canonicalization. Most webmasters consider cononicalization a www/non-www issue, but it's really a dupe content issue, and this is what the engineers are hoping to improve on after the infrastructure changes going on with Big Daddy.

idolw

2:19 pm on Mar 25, 2006 (gmt 0)

The ultimate SE spam is coming. "original content spam". That will be a tough one for the SE algorythms to fight.

heh, then they will go back to links only.
chicken in the egg, huh? ;)

activeco

3:02 pm on Mar 25, 2006 (gmt 0)

He is not saying that all pages from all domains are going to get equal ranking. He's saying that you don't have to worry about being penalized for the same content on multiple domains. What Google does is try to show the best URL in their listings, once. It is not a penalty to have the other copies on other domains appear much lower.

I don't think this was a good explanation.
If you look at White's post, there are two, at the first sight, contradictory statements:

1. "Different top level domains: if you own a .com and a.fr, for example, don�t worry about dupe content in this case."

2. "It's also important to keep in mind that our crawlers don't index duplicate content, so creating identical sites at several domains will likely not result in their returning for many country restricts."

IMO, the only satisactory explanation is, in short, this one:

1-a: If your own widgetcompany.com & widgetcompany.co.uk with identical content = no filter

1-b: If your own widgetcompany.com & widgetcorporation.fr in different languages = original content on both sites = no filter

2. If your own widgetcompany.com & widgetcorporation.co.uk with identical content = identical pages from one site filtered (not indexed)

Miop

5:08 pm on Mar 25, 2006 (gmt 0)

<For example if your site is an authority on "Blue widgets" you would think it would rank for "Your blue widgets", "my blue widgets", "everyones blue widget" "Blue widgeters" etc, etc, etc. It you purchased adwords for that keyword it would feature on all the similar keyword searches.

Funny you should say that - on my pages where I removed the title page tagline 'from company name', the pages still rank - the ones with tagline intact have now gone supplemental. I don't know if it is an intended effect or not, but that's what happened.
When you look at the Serps now on a datacentew comparison, lots of site at the top now just have widgets as a title. I don't like it personally - all the listings look the same. I like it even less that those tagline pages have gone supplemental. :(

Vadim

5:10 am on Mar 26, 2006 (gmt 0)

Don�t know why he says that as it goes in direct contrast to what is posted on the Google guidelines

Time may be one of the explanations.

If I recall correct, Matt Cutts mentioned that webmasters guidelines were slightly obsolete. He probably did not mean that they are not correct, simply algorithm is changing continuously and some accents or weights may change with time.

Vadim.

Lorel

7:42 pm on Mar 26, 2006 (gmt 0)

Google's spokesmen may claim Supplemental Results are not a penalty but just try ranking highly for any page affected. I consider this a similar statement to "competitors can't hurt your rank". Both are untrue as anyone can see by any site highly affected by SR or hijackers and content thieves.

I say consider Supplemental Results a penalty and fix it by adding more content to the page if the page is too short or rewriting it. I've seen pages recuperate with this method.

Miop

10:48 pm on Mar 26, 2006 (gmt 0)

<Google's spokesmen may claim Supplemental Results are not a penalty but just try ranking highly for any page affected.>

All my supplemental pages are effectively 'frozen' - they haven't been fresh-cached for nearly a year.

joeduck

11:01 pm on Mar 26, 2006 (gmt 0)

It is a major flaw in the algorithm when a quality, honest site gets dinged in favor of less relevant results due to incorrectly perceived duplicate content penalties

Nicely put. It's actually also arguably a flaw if a "low quality" or even DISHONEST site with more relevant content for a particular query is taken out. I hope Google spends it's time worrying about serving relevance rather than following "extremism in the defense of the algorithm" which has led to a LOT of collateral damage - I'd estimate that for at least half the highly competitive phrases the "best site" (if it was defined by a huge user panel) is no longer served and I think it's starting to hurt Google as people look to vertical rather than full web search tools.

Whitey

1:58 am on Mar 27, 2006 (gmt 0)

Miop msg#73

The cached data is from 2004

That's a situation I'm seeing more of, inless i previously missed it. We have many frozen pages cached at Nov 2004 delivering fresh content every day.

Does your site have consistant original content? and were you the first to release it?

This 83 message thread spans 3 pages: 83