Welcome to WebmasterWorld Guest from 54.161.228.30

Message Too Old, No Replies

Matt Cutts: Google Algo Change Targets Dupe Content

   
4:59 pm on Jan 28, 2011 (gmt 0)



[news.ycombinator.com...]

Earlier this week Google launched an algorithmic change that will tend to rank scraper sites or sites with less original content lower. The net effect is that searchers are more likely to see the sites that wrote the original content. An example would be that stackoverflow.com will tend to rank higher than sites that just reuse stackoverflow.com's content. Note that the algorithmic change isn't specific to stackoverflow.com though.

I know a few people here on HN had mentioned specific queries like [pass json body to spring mvc] or [aws s3 emr pig], and those look better to me now. I know that the people here all have their favorite programming-related query, so I wanted to ask if anyone notices a search where a site like efreedom ranks higher than SO now? Most of the searches I tried looked like they were returning SO at the appropriate times/slots now.


I know there's an existing thread for SERP/algo changes, although this mainly seems to be a 'new' development in that it relates to further tackling dup content scrapers. Mods feel free to merge with an existing thread if needed though.

From Matt Cutts Blog:

I just wanted to give a quick update on one thing I mentioned in my search engine spam post.

My post mentioned that “we’re evaluating multiple changes that should help drive spam levels even lower, including one change that primarily affects sites that copy others’ content and sites with low levels of original content.” That change was approved at our weekly quality launch meeting last Thursday and launched earlier this week.

This was a pretty targeted launch: slightly over 2% of queries change in some way, but less than half a percent of search results change enough that someone might really notice. The net effect is that searchers are more likely to see the sites that wrote the original content rather than a site that scraped or copied the original site’s content.

[mattcutts.com...]

[edited by: Brett_Tabke at 9:22 pm (utc) on Jan 28, 2011]
[edit reason] Added link for the Cuttlets [/edit]

7:04 pm on Jan 30, 2011 (gmt 0)

5+ Year Member



Do we have some positive reports, rather than puzzling over lost traffic


Yes, I did have a URL jump from #11 to 6 on the 27th for the second biggest kw in it's niche, and it does seem to be a result of the farm update.

This is a site I bought last year, and the prior owner wrote extremely long, well written origional content. (I often buy sites of failed or discouraged competitors)

I've often thought the pages were TOO LONG, but never quite got around to chopping them up. It SEEMS these long pages may have helped, as I have not noticed any appreciable change in any of my other sites or URLs.

The fact that only the one site/url with the most/best written content improved seems to indicate that in this particular case Google actually hit the target. My sympathies to all those who weren't so lucky this time.

[edited by: trakkerguy at 7:08 pm (utc) on Jan 30, 2011]

7:08 pm on Jan 30, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



@Brett: the "low quality on page" and this "dupe check" noise we suddenly hear in the past 10 days are IMHO related. I would not disregard a strong relationship between these 2 "major" algo changes as you did in the response to this post! In fact I do believe these are both on-page measures and are both implemented at the same spot in their ranking process.

Further, I do vote for "Seedfinder" update... at least they TRY to find the seed of that content :->
7:12 pm on Jan 30, 2011 (gmt 0)

5+ Year Member



@pontifex: Totally agree that the dupe check/low quality issue is related to the content farm issue. Small percentage of quality origional content is probably important part of their definition of content farm
7:37 pm on Jan 30, 2011 (gmt 0)

WebmasterWorld Senior Member tedster is a WebmasterWorld Top Contributor of All Time 10+ Year Member



For now I'm going to try the "Farm Update". It's pretty self-explanatory for anyone who's following the plot, so in the future it will be an easy to understand reference.
10:37 pm on Jan 30, 2011 (gmt 0)

10+ Year Member



want this update called Taco Bell
1:33 am on Jan 31, 2011 (gmt 0)

5+ Year Member



Word from some blackhatters is they are deindexing some autoblog networks, but others are still doing fine (so far).
1:53 am on Jan 31, 2011 (gmt 0)

WebmasterWorld Administrator brett_tabke is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month Best Post Of The Month



> Farm

Not the way I read this at all - this update was about dupe content - not low quality original content.

"Earlier this week Google launched an algorithmic change that will tend to rank scraper sites or sites with less original content lower. The net effect is that searchers are more likely to see the sites that wrote the original content. An example would be that stackoverflow.com will tend to rank higher than sites that just reuse stackoverflow.com's content.

That is not about so called content farms like About.com. That statement is about dupe content.

Show me where Matt says anything about content farms?

I am willing to concede that there may be a wink-and-a-nod going on here where Matt doesn't want to use the 'farm' word because of legal fallout, but I didn't read this as pointing at content farms yet. I think we need to get this story right. This is very important to alot of people.
2:05 am on Jan 31, 2011 (gmt 0)

WebmasterWorld Senior Member tedster is a WebmasterWorld Top Contributor of All Time 10+ Year Member



I see what you mean, Brett. The lines were blurry because of the timing with other Google discussions.

So where to now? More ideas?
2:06 am on Jan 31, 2011 (gmt 0)

WebmasterWorld Administrator 5+ Year Member Top Contributors Of The Month



That is dupe content start-to-finish.


That's going to be harder to name...

Synonyms: duplicate, reduplicate, double, repeat, replicate, reproduce

"Dupe update" ?

<added>
"Plagiarism update" ?
(but does not roll off the tongue so well...)
</added>

[edited by: aakk9999 at 2:17 am (utc) on Jan 31, 2011]

2:14 am on Jan 31, 2011 (gmt 0)

WebmasterWorld Administrator brett_tabke is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month Best Post Of The Month



> Why would anyone be worried about something that only affects 2%
> of the queries, unless the queries are limited the specific verticals?

(grin) What do you think we are talking about when we talk about "long tail"? It lives down there in the single digit percentages of all searches. (remember, 20% of all Gmail users do not realize you don't have to go to google.com and type Gmail into the search box to get to Gmail).

I hear you pontifex, but on 5-6 pet kw's where content farms live, I can't see a single change to those serps.

Type just about any tech query with 'how to' and the usual suspects are still there in the top 30'ish on almost any query. The ones that were top 10, are still there.

We will know more tomorrow when the stock market opens ;-) (watch that company's stock that just IPO'd last week ;-)
2:33 am on Jan 31, 2011 (gmt 0)

WebmasterWorld Senior Member tedster is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Zombie Update?
Ripoff Update?
Imposter Update?
3:04 am on Jan 31, 2011 (gmt 0)

WebmasterWorld Administrator webwork is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Doppelganger [google.com...]
3:40 am on Jan 31, 2011 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



How about "Update dupduped" or just "dupduped" where the first dup stand for duplicates and "duped" stands for how google dealt with them :)
3:42 am on Jan 31, 2011 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



u should spell it as "dapdapped" though it is "dupduped"... duples get duped by google... lol
4:04 am on Jan 31, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



that will tend to rank scraper sites or sites with less original content lower


No mention of removal of scrapers only just ranking them lower. They certainly don't want to totally damage that Adsense income coming from the duplicates.
9:29 am on Jan 31, 2011 (gmt 0)

10+ Year Member



The only main change I can see in the UK for specific insurance terms is that the comparison sites now hold positions 1 - 5 across all the categories. Perhaps this algo change affected other areas too.
9:46 am on Jan 31, 2011 (gmt 0)

5+ Year Member



"Copycat update"?
10:57 am on Jan 31, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Well, from the previous post Google made it sound like the algo for dup content was already designed. But he didn't say "live" although he did imply it was there. So one would assume this post was a bit more about content farms. I do suspect they go well together, low quality content and dup content somewhat related to content farms...
1:30 pm on Jan 31, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Echoes...
1:41 pm on Jan 31, 2011 (gmt 0)

10+ Year Member



I just seen the topic of this thread and thought I drop in to let you know what has happened to our main site over the past few weeks.

A competitor has stolen/copied our home page content and posted it on about 80 or so blogs to fill those pages full of 'dummy text' and has used their own anchor text for our industries leading keyword placed throughout the copy of those pages.

Now all of a sudden since this has happened we have dropped from page 2 to page 5 and rapidly declining for that main keyword and that competitor who is brute forcing Google is now on the first page.

I thought your competitors can't harm you?

This is the first time that I have gone in and researched why we have dropped and have come to the conclusion that it's some sort of aggressive change in the duplication filter.

When you click 'Latest' and see major portions of our home page copy on all of these sites popping up every 2nd day for the past 3 weeks, it's the only conclusion based on years of experience that makes sense.

Now what ? I was so concerned today that this is the likely cause that we had new copy written for the home page. Now will be interesting to see if they use that ?

It's a sad state of affairs if this is truly the case.
2:19 pm on Jan 31, 2011 (gmt 0)

WebmasterWorld Senior Member themadscientist is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



Doppleganger + 1
2:25 pm on Jan 31, 2011 (gmt 0)

WebmasterWorld Senior Member themadscientist is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



I thought your competitors can't harm you?

You really have to split hairs to 'get' the word from G sometimes...

There's almost nothing a competitor can do to harm your ranking or have your site removed from our index. (Emphasis Mine) [google.com...]

When you think there are probably 1,000,000 things a competitor can do online then less than 1000 would be 'almost' nothing in someone's opinion, right? The 'almost nothing' statement is highly subject to the interpretation of what constitutes 'almost' nothing.

Maybe 1 or 2; Maybe 10 or 15; Maybe 200 or 300...
How many someone might guess depends on their interpretation of almost nothing.
2:40 pm on Jan 31, 2011 (gmt 0)

10+ Year Member



The thing is the the blog spam is so amateurish, it's likely the competitor is simply using some automated blog spamming software that creates scraped content 'article' blog postings and all they have to do is <insert anchor text> and the tool does the rest.

For Google to then go and reward that type of behaviour, punish the original and true authoritive source of the content and then publicly throughout the years deny that blog spamming doesn't work is beyond belief.

Oh well, I'm here for the long term.. Google usually gets it right so will await to see the outcome as it plays out.

I suppose blog spammers, article/forum spammers who drop anchor text on garbage might come out the front to be truly the ones we should call "SEO's" :)

Eventful day... sleep.
2:46 pm on Jan 31, 2011 (gmt 0)

WebmasterWorld Senior Member themadscientist is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



The thing is the the blog spam is so amateurish, it's likely the competitor is simply using some automated blog spamming software that creates scraped content 'article' blog postings and all they have to do is <insert anchor text> and the tool does the rest.

1.) That's probably only the front end of the system you're seeing... How many external 'minty fresh' links go to that 'minty fresh' page to make it 'more important for minty freshness' than you have?

2.) The thing you have to remember about the algo is it's not a person... They can check grammar to see if it is well written, they can check spelling, they can find related words and phrases to see if they're used at the right percentage, but what they cannot do is read it like a person and determine if it's junk or not.
3:29 pm on Jan 31, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Well you had it right Tedster, we regained almost all positons yesterday. Seems it took about 5 days for them to sort it out after the new algo was put live.

So it would seem at least in our case we are no longer collateral damage. Hopefully Matt and his team recognized the problem.
4:04 pm on Jan 31, 2011 (gmt 0)

10+ Year Member



outland88 I would support ANNIHILATION of scraper / content farms.

Ranking them lower is a wink and a slap on the wrist to buy them time to figure out a way to game the system again. The guys that run these outfits are the best in the business and probably hanging out here ;o)
4:17 pm on Jan 31, 2011 (gmt 0)

5+ Year Member



Does anyone else see correlation between these updates and a mid Jan (12-13th in my case) drop? I'm trying to understand if it is a dupe penalty I'm suffering, as had a competitor scrape my site in December and sent thousands of people from #*$! sites to it. (they kept my stats code on the page stupidly so I saw it happening).

Convinced this is the reason for my drop but have passed through one reconsideration request with no joy after having got the scraper site taken down.
6:36 pm on Jan 31, 2011 (gmt 0)



Any opinions on using small snippets of wiki info on a page? it's always been a practice of ours to add a small (maybe 2 lines or a short para.) piece of wiki content to a description of a product we sell. This is in addition to our own unique product description.

Any opinions on this practice? We are not an information site, we are an e-commerce site and not reproducing the content for purposes of ranking where wikipedia would, in fact, there really should be no occurrence where one of our pages and one of wikipedia's pages would show up for the same search tern.

Any thoughts? should we be worried that this is/will effect us negatively? should we stop? should we remove wiki content from our site?

thanks!
6:49 pm on Jan 31, 2011 (gmt 0)

WebmasterWorld Senior Member crobb305 is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Scrapegate Update
7:09 pm on Jan 31, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Doppleganger + 1

from that link above:

Doppelgangers are monstrous humanoids, identified primarily by their ability to change their shape and appearance to mimic almost any humanoid creature. ...
This 133 message thread spans 5 pages: 133