| This 133 message thread spans 5 pages: < < 133 ( 1 2 3 4  ) || |
|Matt Cutts: Google Algo Change Targets Dupe Content|
| 4:59 pm on Jan 28, 2011 (gmt 0)|
|Earlier this week Google launched an algorithmic change that will tend to rank scraper sites or sites with less original content lower. The net effect is that searchers are more likely to see the sites that wrote the original content. An example would be that stackoverflow.com will tend to rank higher than sites that just reuse stackoverflow.com's content. Note that the algorithmic change isn't specific to stackoverflow.com though. |
I know a few people here on HN had mentioned specific queries like [pass json body to spring mvc] or [aws s3 emr pig], and those look better to me now. I know that the people here all have their favorite programming-related query, so I wanted to ask if anyone notices a search where a site like efreedom ranks higher than SO now? Most of the searches I tried looked like they were returning SO at the appropriate times/slots now.
I know there's an existing thread for SERP/algo changes, although this mainly seems to be a 'new' development in that it relates to further tackling dup content scrapers. Mods feel free to merge with an existing thread if needed though.
From Matt Cutts Blog:
I just wanted to give a quick update on one thing I mentioned in my search engine spam post.
My post mentioned that “we’re evaluating multiple changes that should help drive spam levels even lower, including one change that primarily affects sites that copy others’ content and sites with low levels of original content.” That change was approved at our weekly quality launch meeting last Thursday and launched earlier this week.
This was a pretty targeted launch: slightly over 2% of queries change in some way, but less than half a percent of search results change enough that someone might really notice. The net effect is that searchers are more likely to see the sites that wrote the original content rather than a site that scraped or copied the original site’s content.
[edited by: Brett_Tabke at 9:22 pm (utc) on Jan 28, 2011]
[edit reason] Added link for the Cuttlets [/edit]
| 2:41 am on Feb 2, 2011 (gmt 0)|
|One useful change Google could do is to make wikipedia a result you must ASK for. |
Forget about making people ask, just give it the top left of the results page right below the logo...
ADDED: They could even change the name to Googipedia and move to a .org.
How could they present the result section? Hmmm... 'Wikipedia the Official Result of Googipedia.Org - At Least One Wikipedia Page Guaranteed to be Served for Every Single Search'
Sry. Feeling like a bit of a smart a** today. ;)
| 4:04 am on Feb 2, 2011 (gmt 0)|
@mad - that's OK we agree 100% with your tongue in cheek point. lol!
I guess we should just be happy that Google can serve up their "pile of cr@p three miles high every second."
| 5:32 am on Feb 2, 2011 (gmt 0)|
How long has it taken Google to finally deal with scraper sites?
I mean really. This does not take a rocket scientist to see that these are harmful, yet their delay in dealing with it until it becomes a major public relations problem is telling.
I am also skeptical they can solve the problem quickly. There is money at stake and these sites are not going to give up easily.
Because Google has made a public statement about this, if they don't succeed, it will give them another black eye.
And the problem with pulling weeds is sometimes you pull the crop as well.
| 5:44 am on Feb 2, 2011 (gmt 0)|
To my mind and looking at the crappy and non-relevant results (youtube, wikipedia, etc) for various terms, google has absolutely no credibility and is a reminder of why I tried to stay away from SEOing for the last few years.
But I'm back, and rather than just blindly and naively create an original site that I know would most please my target audience...........instead of I'm going to "examine in detail" the number 1 site for my target phrases.
Because google apparently does not respect genuine work.
| 6:09 am on Feb 2, 2011 (gmt 0)|
watchtower 101 - could someone please develop a browser extension / ad blocker to cut off wikipedia and repressive government sponsored website results in G or at least tank them to position 10,000?
| 8:38 am on Feb 2, 2011 (gmt 0)|
|could someone please develop a browser extension / ad blocker to cut off wikipedia and repressive government sponsored website results in G or at least tank them to position 10,000? |
Why? And miss all that info of "Who's On First?"
Last thing we need is fourth party filtering third party (Google) info. Beware what is asked for!
| 10:15 am on Feb 2, 2011 (gmt 0)|
pontifex (and several others here) are definitely on to something. I should add, with regard to those "shingles" which pontifex cited, that we're now also into n-grams and vectors. I've noticed some collateral damage with the update that I think may shed some light on these thoughts....
An in-depth consumer information article I wrote about 6 years ago, which has had consistent #2 clustered rankings for a competitive single-word query, has been scraped to death, so much so it's been impossible to follow up with DMCAs. Over the years, it's occasionally dropped out of Google and come back. I could generally check whether it was in the index by quoting a sentence, and/or make it come up by disabling the dupe content filter. Until now, Google has always brought it back.
With this update, the article has basically vanished for its search terms, replaced by a newer and fluffier article on another domain with more social-friendly packaging. No hard feelings that it's been outranked, and I've learned a few things about the packaging. The new article doesn't quote my original article at all. It parallels it quite a bit, but they all do... the story is essentially the same.
What I'm seeing, though, is that the original article also now disappears not only competitively, but also on searches for some quoted segments, though not all of them. Google now is apparently not treating the article as a whole. It's likely... for reasons described in this thread... that looking at any article as a whole is becoming impossible. As pontifex suggests it might be, Google appears to be looking at the article in pieces.
If I search for exact strings, say, sentence by sentence, it also appears that Google is also no longer treating these queries as searches for exact word matches, but may rather be looking at them conceptually.
This is something we've discussed in the page title discussions and have been mentioned in various update threads... I'd have to do some checking to find the references... but each chunk appears to have a different level of competition that's fairly pronounced, not previously the case with a 12-15 word quoted search.
Perhaps this relates to how often a phrase has been scraped... perhaps to how competitive the vocabulary or the "concept" is that's described by the quoted string... or there may be a quirk in Google's phrase-based indexing. I see that the core sentences are those that disappear most often.
I've been seeing parallels to this for a while now on sites that had a lot of internal duplication, used a lot of repetitious boiler plate in their content, had a lot of affiliate duplication or were scraped a lot, etc. This example I'm citing now is sobering enough that I'm thinking that assumed "evergreen" content may be more vulnerable than thought... and, very simply, if material gets duplicated and shifted around long enough, Google may be giving up on identifying the source and be dumping it into a dust bin of history.
I have other thoughts that involve testing I believe the article has gone through from externals I've observed (multiple pages from the site being returned, I think, to test which of several pages should stay), but I wasn't associated with the site last year so haven't had a chance to monitor that particular data, if it was collected.
For now, I can say from serp watching that a lot of search refinements that I've been seeing on various searches, like multiple pages returned for a domain, etc, seem to have shifted with this update... and not just for this set of queries, but for many others... perhaps suggesting that the evaluations which the refinements were a part of have been incorporated into the new algo and the testing shifted to somewhere else.
| 12:14 pm on Feb 2, 2011 (gmt 0)|
We've been getting scrapped to death ever since, but so far no one has outranked us with our content. I wonder why it worked fine for us and not for others. Could be the industry we are in.
| 12:49 pm on Feb 2, 2011 (gmt 0)|
I manage a site that has "Industry News" showing manually/editorially chosen and relevant press releases that relate to the brands and products on the site. The "News Sections" seem to have taken a hit on referrals between the 24-26th. (Which they should, the content is for readers, not search engines, and has always been properly attributed)
I'm watching to see if the entire site as a whole will be affected as well. Too early/not enough data to know for sure yet, since the sites are also driven by seasonality.
On a few other sites I take care of... I spotted very erratic traffic behavior around the same time (the 24-26th) where the referrals are spiking and diving, spiking and diving, etc, etc. Its not like the old pogo-sticking we talked about in other threads.... its very extreme and very daily. (Up-big, Down-big, Up-big, Down-big, etc)
Interesting to watch anyhow... which is the ONLY thing I can justify doing right now. Let it wash, focus on the visitors... Let Goog do its own thing and take what I can get! :-)
| 11:32 pm on Feb 3, 2011 (gmt 0)|
Shaddows mentioned this:
[I will be most interested in the implication for ecoms using centralised product text (be it third party or manufacturer), or whether that "class" of site will be exempted.]
Has anyone seen any implications on this topic as of yet? I see many, many ecom sites all using the same manufacturer's description/data feed.
| 12:05 am on Feb 16, 2011 (gmt 0)|
|For now, I can say from serp watching that a lot of search refinements that I've been seeing on various searches, like multiple pages returned for a domain, etc, seem to have shifted with this update... |
@RC - what do you make of the multiple page results coming from some sites? I am finding some of my site pages in several consecutive results, in one case a total of FOUR starting at the 1 or 2 position. I can't say I'm comfortable with that, because you'd think I'd be getting a surge or traffic, but the opposite is true. It could also be indicative of some odd new penalty.
I have what might be considered "duplication" throughout my site, but it's more like a catalog index page containing "widget 1, widget 2, widget 3..." each item linking to a detail page of each widget.
There is no exact duplication, but perhaps what might be considered re-hash or more information in the way of a detailed description. What's wrong with that?
I get the feeling Google wants all our content on page 1. Any repeating of what's on page one for the sake of clarification or providing the customer with product detail might be marked as duplicate content? That would be crazy!
| 1:15 am on Feb 18, 2011 (gmt 0)|
2/17/2011 - at 1:00 pm CST checking our niche serps is showing sudden larger than normal moves with what would be considered "thin/dupe" sites now outranking the usual leaders. Just reporting what I'm seeing in my niche.
| 7:33 am on Feb 21, 2011 (gmt 0)|
our site goes down even if we follow google's guidelines. It seems Google has some error's implementing the new updates. They should have made more research work before they post the updates.
| This 133 message thread spans 5 pages: < < 133 ( 1 2 3 4  ) |