| This 102 message thread spans 4 pages: < < 102 ( 1 2  4 ) > > || |
|Google's Matt Cutts Talks of New Focus On Low Quality Content|
|... attention has shifted instead to “content farms,” which are sites with shallow or low-quality content. In 2010, we launched two major algorithmic changes focused on low-quality sites. Nonetheless, we hear the feedback from the web loud and clear: people are asking for even stronger action on content farms and sites that consist primarily of spammy or low-quality content. We take pride in Google search and strive to make each and every search perfect. The fact is that we’re not perfect, and combined with users’ skyrocketing expectations of Google, these imperfections get magnified in perception. However, we can and should do better. |
One misconception that we’ve seen in the last few weeks is the idea that Google doesn’t take as strong action on spammy content in our index if those sites are serving Google ads. To be crystal clear:
Google absolutely takes action on sites that violate our quality guidelines regardless of whether they have ads powered by Google;
Displaying Google ads does not help a site’s rankings in Google; and
Buying Google ads does not increase a site’s rankings in Google’s search results. [googleblog.blogspot.com...]
[edited by: Brett_Tabke at 7:12 pm (utc) on Jan 21, 2011]
[edit reason] added quote [/edit]
Everyone seems to be latching on to Matt Cutt's use of "content farm" and then defining that in all sorts of different ways. I think it would be valuable to unpack the phrase starting with Matt's own, albeit vague, definition:
|As “pure webspam” has decreased over time, attention has shifted instead to “content farms,” which are sites with shallow or low-quality content |
In this case, the use of the word "farm" is intended to be a derogatory description of content that is produced at a massive scale. The term seems to rely on the notion that the highest quality things are created by craftspeople in small batches vs. a factory approach of massproducion.
Lets look at different types of textual content on the web that are all created at scale, starting with the shallowest and lowest quality and moving up the quality ladder:
1) Scraper sites. These are sites that automatically create pages about a particular topic by aggregating content that it finds from around the web.
2) Sites with algorithmically-generated content that does add some value. There are many sites that aggregate info in useful ways.... The about-us type pages that aggregate everything possible about a domain can be marginally useful. A site that aggregates all social-media discussion about a particular topic/brand can be convenient too.
3) Sites with thin human-created content. wiki.answers has many pages with one-word answers. Most content by sites in this category is not checked by an editor. Some pages are valuable, many are worthless. Often, pages with questions but without answers pollute the SERPs (although Google seems to have reduced this problem recently).
4) Sites that rewrite existing web content. There is a wide variety of repackaging - simplifying a complex wikipedia article into laymen's terms could be quite valuable but just rewriting it with synonyms or adds nothing.
5) Rev-share content sites. These are the sites that allow users to contribute and market content and they share in the revenue that the pages create. Many of the links to these pages are given by the content producers themselves because they have something to gain, so the incoming link equity is not necessarily a true "editorial vote" that vouches for the quality of the content.
6) Rev-share content sites _with_ editorial oversight. Some rev-share sites have editorial oversight and reject submissions that do not meet their guidelines. Since there is virtually no cost (other than brand devaluation) for them to publish something of questionable quality, the tendency of these types of sites is to reject only the most egregious submissions.
7) Sites that pay upfront: Much of what Demand Media does goes here. They have editors that review all submissions; they reject poor submissions and fire poor contributors.
8) Sites that pay upfront where writers work with dedicated editors. These sites address the potential problem of quality at scale by breaking everything down into small teams - so you can think of this as a bunch of small craft brewers making as much beer as budwiser.
9) Wikis. There are good arguments that suggest that wikis are much lower on the totem pole, but since they will never be labeled a "content farm" by google, it doesn't really matter where we put them.
10) Traditional media. NYTimes is a content farm if you use the term "content farm" literally - content produced at scale... Take a look at pictures of their massive newsroom - it looks just like what you would think of as a "content factory" or "content mill."
The key question then, is everything done at scale necessarily bad? You may disagree with the order of the list above, but there are scores of sites much more in danger than eHow.
One last note: sites have more than one type of content. Some may have many original articles, but then many algorithmically generate pages that also pollute the SERPs. eHow has topic pages that add little or no value - those could be at risk. Many people have accused eHow of mirroring the exact flow of other web content (see #4 above) - those pages are probably at risk as well.
I am a little surprised about the eruption that this story has caused. I think it is important to note that the two parties most discussing this issue have their own axes to grind: webmasters and traditional journalists.
Webmasters are upset when other sites rank above theirs. If eHow outranks you in the SERPs, then you are likely to join the other army of webmasters that are similarly outranked and hope that eHow somehow gets axed from the SERPs. Traditional journalists also hate eHow, but for a different reason... Demand Media's compesation model undercuts the high salaries that journalists used to make when newspapers were in their heyday.
I am not defending eHow or Demand - I just dont think any of their original articles are at the level of risk that many people have suggested. If it is the case that a big portion of their traffic goes to topics-pages or to content that is rewritten from other web sources, then they should probably be a little worried. There are still sites at the bottom of the quality-ladder that still appear in the SERPs - Google will likely start there and then start to move up. I am not even convinced that they will ever get to #3, although the web would probably be better if they did.
This is a great example of how Google's ownership of both the leading search engine and the leading contextual advertising program can taint its objectivity. "Quality" aside, where does their economic incentive lie?
Their search engine skyrocketed to success on a "Don't Be Evil" mantra that separated paid/organic results, but the fact was that, at the time, that was an economically sound move, because the market was ripe for such a product.
Now, the greatest money to be made is by feeding their ads out to content farms that, in a completely unrelated algorithmic coincidence, rank wonderfully in their search engine. They maintain that their "paid" results are kept separate from their organic results, but it's not that simple.
The market is savvying up, the incentives are shifting, and people are once again demanding that paid and organic results be truly separated, and that the separation is not obfuscated by the existence of paid relationships between content providers and search results providers.
Elsewhen, that's a very astute post. And I don't necessarily care if ehow or wikipedia or answers.com or squidoo or whatever get penalized.
What I'm doubtful is whether Google can differentiate their version of 'low quality' sites, from mine, which in most people's estimate would be high quality. Because the single biggest similiarity between some of my sites and sites that would be percieved as farms by most, is size.
I wasn't attacking you...the OP refers to content farms and low quality-this thread wasn't about you. I'm not the internet police, just a lowly ecom developer, HOWEVER there are a bunch of "content farms" floating out there and A LOT of them are prety much useless. I'm sorry if it my post offended you but I stand by my post and think a lot of people would agree.
Google decides what goes in their index based on their algo-good or not. If G doesn't like what it sees they will move on...
|Google decides what goes in their index based on their algo-good or not. If G doesn't like what it sees they will move on |
I'd argue that a well built site with a bunch of quality content pages has the potential to outrank Wikipedia IF that site is exclusively about the subject of the query. That is to say, stay focused like a lasor beam if you want a shot at ranking well at Google.
For example, a site utterly devoted to music phenom Willy Widget can go to #1 if it has a thorough bio, discography, tour schedule, photo gallery, discussion forum, includes links to other quality sites that relate to Willy, a URL such as AboutWillyWidget.com, AND got out early on all that.
So far, only the established mega sites seem to operate successfully outside the laser zone, and until this point, Google has accepted that, often reacting favorably. But things change and we're left to figure out where the new boundaries are.
In 2010 we saw Google attack SEO with some measure of success. Taking MC at his word, it appears in early 2011 that they now have their sights set on "content farms" ~ to be honest, if I was in that position, I'd be looking over my shoulder.
Elsewhen, righto! Well said.
|"Quality" aside, where does their economic incentive lie? |
Jonathan, welcome to WW. Good point, but I think this move by Google goes to the heart of their economics. The better the results, the more people will depend on their service, and thus it snowballs. And, once people get burned more than twice, they'll start looking elsewhere.
Tonight we had a bit of a medical crisis here. One member of the family to go to the emergency room with problem with their right eye. Very upsetting.
Naturally, we turned to the web for some information. Finally found what we needed to know at the National Institute of Health (a US federal agency). But, it took some very, very aggressive searching, with some key terms from our specialist's website, to find out what we needed.
And, it shouldn't have been difficult. It was a straightforward question and a common-enough medical event. But what Google and Bing offered with the common terms was results from content farms offering very brief one or two sentences and "see your doctor." eHow among the worse.
I have been among the voices who have stressed getting sponsorships outside of Google and building relationship where you don't live or die by search engine traffic. That said, Google's impact cannot be ignored. I have one client where the search engine rank is vital to the enterprise (and it ranks high) so I'm going to be watching this. They must do better and those of us who are offering quality information should wish them well. But, please Google, be careful.
Hmmm... Interesting. I think the way I search keeps me from seeing these issues as much as other people. I usually know of the site I want the information from and include the domain name in the search. Using the search above as an example I would search for 'condition nih.gov' rather than just the condition...
I guess the fact I do search the way I do does confirm there is a definite issue with not just Google, but many search engines.
Anyway, I didn't really think about it before but the more I think about it, the more I think the way I search says there's a definite problem, because wheel is correct IMO, the search should be easy to return a quality, correct result for...
I think one of the biggest issues is determining the 'correctness' rather than just 'implied quality' of the result... A page may be well written and present 'quality signals' to a search engine, but IMO what they are really going to have to do is find 'correctness signals' to present the type of the results they would like, and that's a definite challenge, IMO... Even more so than determining 'quality' from on and off site factors.
|I'd argue that a well built site with a bunch of quality content pages has the potential to outrank Wikipedia IF that site is exclusively about the subject of the query. That is to say, stay focused like a lasor beam if you want a shot at ranking well at Google. |
unfortunately this is getting harder and harder to do Reno. I've been trying for about 6 months now just for fun. I targeted a niche, 3 word keyword which wikipedia ranks #1 for. bought the exact match, aged domain (registered since 1997, .com). wrote thousands of pages (over 6 months) of unique content, 6 months of link building creating 5 times the page links than WP has for their page and 4 times the specific keyword anchor text than WP has pointing at their page. WP's entire page on the subject is 270 words. We have a 1000+ page, exact match domain, laser beeeeeeam content website dedicated to the subject and nothing but the subject, no adsense or any other advertising or outbound links and we have not yet been able to shake WP out of the tree.
now from a business standpoint I'm not that worried about it, WP sucks so bad on this topic that we are not losing much traffic by being in the second spot. I guess it's just personal!
In my opinion, this is nothing but smoke and mirrors.
Considering Google has a very poor reputation among many in the webmaster community, even the owner of Webmaster world does not use Google for his searching, they have to try and do something.
Unfortunately it’s just a bit too late.
I predict we are going to see massive fallout just like Florida and good quality sites that followed all the rules will once again get hammered by their newest effort to attack spam.
The sooner Google loses its majority grip on the searches and falls into the 50 or better yet 40 percent share, the better off every single webmaster on this forum will be.
Surely bounce rates hold the key to determining low quality sites. How they are measured and evaluated is down to Google but at the end of the day vistors aren't going to hang around a crap site. That's the way I'd do it anyway.
That would be best for everyone IMO - searchers, Google itself and webmasters. Quality site owners wouldn't have to worry about getting caught in the net then.
@Simsi You really can't tell much from a bounce rate...
Price comparison shopping is a good example.
Click result 1, check price.
Click result 2, check price.
Result 3 price is in the description, so no click.
Click result 4, check price.
Wait to buy later...
Which was the right site?
There are a huge number of ways bounce rate doesn't work... I have one page with an average monthly bounce rate of 85% to 95%, which probably seems extremely high and like a non-quality result, until you know the average time on the page is over 6 minutes...
Sometimes a high bounce rate means Google and the site owner both did their jobs right, in other cases it means the wrong result was shown... The thing you can't do reliably is tell which it was from a bounce rate, because there are too many reasons for a person to only view one page; some single page views indicate a good result, and some a not-so-good result.
Great - one of the threads that make WebmasterWorld the best!
There is little to add to the responses here, just a remark from where I sit:
IMHO there is a big difference in "determine the quality level of a page" and "separating SPAM from legit".
I know that most of you disregard my opinion that: the technology behind a large scale search engine is not rocket science, just pretty complicated.
Yet following my own opinion I still believe that Google is not performing miracles, but a series of programmatic steps that ultimately lead to a SERP for a keyword.
"... new classifier ... on-page content..." reads in my world as a new pattern matching algo that will be performed during ranking calculations.
That is IMHO primarily a scraper fighting approach - to filter and identify something as JUNK from a page, you need a pattern to look for.
So my 2 cents on that:
Clearly identified JUNK will be used to filter out NEW JUNK! That sounds OK to me!
|internetheaven: So now I HAVE TO WRITE a junk piece of SEO nonsense on my front pages to stay in the index |
Good one - but for photo-driven sites it was ALWAYS a good idea to have photos with an extensive info part in their header (EXIF) and is as important as it ever was. The source of these photos will rank for the EXIF info - on the photo search in any case.
Beyond that: ranking for a search term with a page that contains only a few words and a bunch of photos without EXIF infos - is that even possible ;-)
I would venture to say that any algorithm, as yet, cannot distinguish between generated spam and a poorly-written or translated, original English-language piece of content.
I see this every day in my industry, great global sites with good stuff, their local language markets understand them very well yet for .com they could easily be discarded as crap only because of their on-page text and, only in "our" SEO eyes, badly SEOd pages.
Do you really trust Google to be able to understand nuances, dialects, garbage and spam? Insofar as I am aware they do not have anywhere near the ability and understandably so.
I do not have any trust when they differentiate in English between basic keyword1keyword2 and keyword1-keyword2.
|Do you really trust Google to be able to understand nuances, dialects, garbage and spam? |
no. their search algorithm outsources all of this nuance-understanding to the web community by crawling for links. the underlying assumption is that web users will not link to spammy pages and will tend to link to high quality and useful pages.
We may all be critical of Google but how many of us have had the opportunity to earn good money from appearing free on their site? Quite a few of us I guess.
I make sites with only the end user in mind. One I am currently working on will probably be hit with a duplicate content penalty, but I have no choice because the same content needs to be displayed in many ways on the site in order to satisfy my visitors.
And satisfy my visitors is all I care about. The income will take care of itself.
To be honest, because the site is local, I am more focused on facebook than google and have every confidence of the site doing well due to good old fashioned word of mouth.
I actually couldn't care less about it appearing in Google (never thought I'd hear myself say that!).
The Web has always rewarded quantity more than quality, but over 2010 this truism became even more pronounced with the growth of Content Farms. These are companies which create thousands of pieces of content per day. Much of it is in the form of how-to articles and is often referred to as "evergreen" informational content, because it's relevant for much longer than news.
I agree that it is just smoke and mirrors.
Does anyone in this forum still believe that Googles algo is coded to return the most relevant results ?
It simply does not, and inherently can not due to its responsibility to the corp and its shareholders to make more money. If they generate results that cause their ad revenue to plummet they will be told to try again. It is a corporation after all. There is only one concern =$
Some valid points on bounce rate TMS, however I would add it's fairly clear that Google treats different verticals, er, differently. In the case of Comparison sites, bounce rates may not be required as a measure as there are other similarities which can be used as comparison tools.
And undoubtedly in some other verticals bounce rates may not provide an accurate measure but I also reckon there are many verticals where it could at least be factored in even if it isn't the only measuring stick.
|Does anyone in this forum still believe that Googles algo is coded to return the most relevant results ? |
I really can't remember anybody at Google using the term "most relevant results," and I do try to listen fairly closely. Going back some years I remember being a bit surprised that Matt only used "relevant," never with "most," never uttered the word "best."
Here's a POV that is interesting, albeit somewhat conspiratorial & with a pinch of paranoia, as regards the impact of the "content farms" algo change on alternative media:
|At it’s core, this new Google algorithm seems to punish information sharing in favor of protectionist conglomerates with large writing staffs. We in the alternative media would do well to recognize that these actions being taken by the elite of the media world are just another sign of their weakened state. |
Read the... Full story [infowars.com]
Syndicated content appears doomed. News and articles.
Original sources could remain validated if they have respected editorial standards. But you won't be able to get free content from article or news sites any more.
The algo changes have me wondering the best page length. It's very easy to repeat the same words even naturally in a long article, but Google could now see this as spam.
Solution: shorter pages and/or pseudonyms?
even if we webmasters know that material is obviously scrapped and copied from somewhere else, so what? it's still useful to someone. google isn't going to risk alienating millions of people who visit these sites daily just because they are regurgitating material.
half the stuff on TV is repeats. "100 best comedy clips", "100 worst comedy clips", its all scrapped from somewhere else. but as long as people keep watching it they will keep putting it on. because its all about getting and keeping the audience. its the same with the web. google needs to keep its audience.
the normal everyday person on the street doesn't know that these sites get their stuff from somewhere else. all they know is that it's still working okay, but no longer in google, so they blame google.
this move might be popular with webmasters, but if google wants to keep its audience then it needs to serve up the sites that people actually use, regardless of where they get their stuff.
after all, how many ways are there to describe a news story in 2 lines? or give the info for an upcoming event? if google wants every site to use different words then they are deluding themselves.
google don't even do that... just look at google news, or google places... the text is taken from other places. it's an automated mash-up. if anyone else created google places then it would be called a content farm.
i reckon google might be fiddling with the algo a bit, but it's really just another thing they are chucking out there because they know that webmasters hang on every word they say. they want us to do our bit in helping them to improve their serps.
we've seen it before, loads of times...
...we are having trouble telling the difference between normal links and paid links... i know... let's tell webmasters that from now on paid links give them a penalty. then they will all stop doing it. CHECK!
...our spidering costs, and storage costs are spiralling... i know... lets tell webmasters that page speed is now a factor in the algo. then they will all reduce their page weight. CHECK!
...we are having trouble filtering spam from the index... i know... lets tell webmasters that syndicated text and scraping text and using RSS feeds is now a bigger factor too. then they will all stop doing it. CHECK!
isn't google just one big content farm swallowing up everyone elses content and monetizing it? :)
I was looking at the sites beating me in the serps today. They all got there with a dozen pages of original content and thousands of really crappy quality paid links.
I heard this first stated many years ago, now it's horribly obvious - google cares about the content they serve. More specifically, then DON'T care how you got there. (i.e they're making improvements to quality of content while continuing to let bought links dominate).
"Sites that pay upfront: Much of what Demand Media does goes here. They have editors that review all submissions; they reject poor submissions and fire poor contributors."
Really? Then the editors should be fired!
Absolute #*$!..fixya for example is such a lousy site yet it grows and grows. It's littered with ads and stolen content.
I am looking at the latest of many ehow articles to cite my site as a source.
It is an extremely weird piece of writing, very much like auto-generated content: it is rambling and incoherent, veering off-topic. Even the title does not make sense and the question is not answered in the article.
And social bookmarking websites that rank higher than the original with copied title and excerpt from the original article. If there are comments on the bookmarked page I can understand, but ranking higher than the original with less content? I guess G can do better.
|I am looking at the latest of many ehow articles to cite my site as a source. |
I think I understand what you mean since I've seen similar things. First they copy your content and modify it, often resulting in an article with plenty of errors and wrong information. Then they cite your site as a source. Once in a while they even copy graphics that have been designed by someone exclusively for my website.
When you contact the eHow moderators and complain about the copyright violations they treat you like a criminal ...
Ehow deserves a serious Google penalty. It's a perfect example of a content farm with content stolen from niche sites, (slightly) re-written and published to make money (for both the writer and the owners of eHow).
|When you contact the eHow moderators and complain about the copyright violations they treat you like a criminal ... |
How can they do that? Share experience with details? That sounds like the mafia.
I have to agree w/ whoever suggested manual intervention on some level. I know engineers like to automate & leave it to the technology, but the 'engine' behind the technology has loads of biases & decisions built into it.
Those ad funds are pretty hard to turn off though. . . but if any co. can afford to, it's G. In the long run, what's right for the user will be what's right for G & her investors.
| This 102 message thread spans 4 pages: < < 102 ( 1 2  4 ) > > |