| This 102 message thread spans 4 pages: < < 102 ( 1  3 4 ) > > || |
|Google's Matt Cutts Talks of New Focus On Low Quality Content|
|... attention has shifted instead to “content farms,” which are sites with shallow or low-quality content. In 2010, we launched two major algorithmic changes focused on low-quality sites. Nonetheless, we hear the feedback from the web loud and clear: people are asking for even stronger action on content farms and sites that consist primarily of spammy or low-quality content. We take pride in Google search and strive to make each and every search perfect. The fact is that we’re not perfect, and combined with users’ skyrocketing expectations of Google, these imperfections get magnified in perception. However, we can and should do better. |
One misconception that we’ve seen in the last few weeks is the idea that Google doesn’t take as strong action on spammy content in our index if those sites are serving Google ads. To be crystal clear:
Google absolutely takes action on sites that violate our quality guidelines regardless of whether they have ads powered by Google;
Displaying Google ads does not help a site’s rankings in Google; and
Buying Google ads does not increase a site’s rankings in Google’s search results. [googleblog.blogspot.com...]
[edited by: Brett_Tabke at 7:12 pm (utc) on Jan 21, 2011]
[edit reason] added quote [/edit]
IMO DemandMedia and Associated Content should both be afraid. Wikipedia is cliquey but it's non profit. War is waged over articles there sure, but by people who genuinely want to get their view over - and it's not about the money, it's about being right - even when you're not :)
They shouldn't be lumped in with the others.
|Google's Matt Cutts Talks of New Focus On Low Quality Content |
I keep reading the title and thinking, 'They've already got that one nailed... They don't need to change much of anything to present low quality content!' lol
|There have been articles written suggesting this might be a threat to DemandMedia but I'm not sure about that. Outside of hand penalizing content farm domains, how can Google automate the process for identifying a well researched article published on a content farm? |
I think the problem is that Google determines a site is an authority and thus views it an authority on every single topic out there. It's why some of these sites can rank for everything while not being an expert on any of them. They need to be more compartmentalized with what's an authority.
As for how it's done, that could be difficult. But I think there has to be a way to figure out that if the Mayo Clinic gets a lot of links from medical websites and doctors, they should be the go to place for medical searches.
On the other hand, my problem isn't with bad content. Sure I'd love to see better results for what I'm searching for from better authorities. I just don't like some of these large sites that have no unique content at all. They scrape 5-6 other sites and still rank high. I can deal with seeing a bad answers site ranking #1 if it's not followed with 5 more sites scraping that same garbage answer. There are a lot of these VC funded "user content" sites that do have some user content, but also flood most of their site with scraped data.
Matt Cutts made quite a few comments on this issue today over at Hacker News [news.ycombinator.com...]
In particular, he telegraphed another algo change in about a week/
|eps commented: Once I stop seeing StackOverflow clones listed above StackOverflow's original pages I will gladly believe that Google's search quality is "better than ever before." |
Matt_Cutts replied: I've been tracking how often this happens over the last month. It's gotten much, much better, and one additional algorithmic change coming soon should help even more.
I'm not saying that a clone will never be listed above SO, but it definitely happens less often compared to a several weeks ago.
Matt says a lot more in that thread - particularly strong comments about the issues of scale, which those of us on the outside of Google can only imagine.
I'm not feeling too good about this either, Google *doesn't have to* hit check boxes beside the biggest culprits to make them rank higher or "keep them safe" within any algo changes, these guys have the $$, resources and connections to trump any of Google's "quality signals". Some examples:
--DMOZ listings? They get dozens and dozens of them and if you submit the page from your site that these guys "copied and re-fiddled", it's no go. Who's sliding all those pages into DMOZ?
--Site speed? They can afford the best servers, tech and design support.
--Inbound links? They're littered all over the net, not just from the average user (that first found them from google) but from high-profile sites that they are connected with in some form another (some business, some personal).
The money these sites haul in can't be competed with, is Google giving YOU the same % they're giving these guys? These guys can outspend any of us any day of the week and not only because of their traffic volume.
I feel the take-over of the net (google serps) by the elite-erati could be flirting dangerously close, and all under the "guise" of users demanding better.
|Maybe G should look for sites that are heavy on Adsence ads as described above. If the site has multiple Adsence ads on every page then G might consider that a red flag... |
|...they could cull Adsense publishers based on quality of content... |
Adsense does cull the sites continually. Do a search on "my adsense account was deactivated" and you will see that there were lots of sites that had there adsense accounts deactivated in 2009 and 2010.
From what I read, the official policy from google on why the accounts were deactivated was that they felt that the pages served as an intermediary (read, extra step) to get to the main content.
But I don't think the sites that had their adsense accounts were ipso facto reported to the google spam team. I don't remember people saying anything like, "my adsense account was deactivated AND I lost my rankings..."
It seems like it would be intuitive to google that if the pages in questions did not have enough quality content to serve up adsense, they probably didn't have enough quality content to rank highly in the SERPs.
|It seems like it would be intuitive to google that if the pages in questions did not have enough quality content to serve up adsense, they probably didn't have enough quality content to rank highly in the SERPs. |
Don't go making sense like that, alright?
Personally, I think you make a great point and it really makes me wonder if the '...but we don't want to do it that way.' mentality they have regarding algorithmically determined SERPs holds them back in some ways.
IMO they would be better off adjusting the '... but we want to do it all algorithmically ...' mentality to '... we want to do whatever it takes to produce the best quality SERPs and if that means people need to be more involved, then people will be more involved ...'
Well, we could not write off any of the above cited sites (or others) as completely bad.Yes, there are low quality pages and there are good ones too.It is mixed bag.They have money to acquire everything but good content defies logic.
But then none of them are free from bad content as well.I was smiling on reading the comment suggesting stackoverflow to be the ultimate one on the web.Everyone claims his/her own site to be best? Isn't it?
I can show you tons and tons of bad quality pages on that site as well.I also read some excellent pages there.
It is easy to criticize google, but one should also appreciate the difficulties that they face in dealing with this.
Like a few others have expressed, what will be worrying is the false positives/ negatives of any algo change that they do.Their testers have a significant role to play here and if anyone need to work hard, it is them.
If i were google, i woul also place siginificant emphasis in increasing and strengthening their support team (those that handle reconsideration requests) as they too will have a significant role to play.Google should try to be more human here than being automated, which is often the case.It will also be great if they could have a call center to handle this. The call center (comprising those who are good in handling public relations) can be a bridge between the public (webmasters) and the google engineers who investigate the complaint. What you say?
I'm currently working on a site that is going to have about 50,000 pages of unique, quality content. When I launch it, I dont' expect to spend much time link building.
What's the chances that Google's going to view the site as 'low quality'? Pretty good I bet. The site won't stand a chance. And it won't be because the content isn't top quality.
Ok, here's what I don't understand, and I don't mean to be critical here, just confused:
|I'm currently working on a site that is going to have about 50,000 pages of unique, quality content. When I launch it... |
If an individual can create 20 quality content pages a day (which is really cranking), it will take nearly 7 years to complete a 50k website. So either this example is a huge team effort; or a lot of content is being delivered pre-packaged by merchants; or there's a software automation aspect involved; or I'm missing something (which happens a lot).
If it's a large team effort, and everyone on the team can write unique quality content, then perhaps Google WILL see it as top quality. But if it's any of the other approaches to site constuction, then I think we can see Google's challenge. And I'd bet there's thousands of huge 50K+ sites that present the same predicament to their algo ... all of which supports indyank's comment:
|one should also appreciate the difficulties that they face in dealing with this |
So yes, I appreciate the difficulty, but still worry that there's going to be a LOT of collateral damage, and wonder if that damage will spill over into sites with 50 pages, not 50 thousand.
|I'm currently working on a site that is going to have about 50,000 pages of unique, quality content. When I launch it, I dont' expect to spend much time link building. |
What's the chances that Google's going to view the site as 'low quality'? Pretty good I bet. The site won't stand a chance. And it won't be because the content isn't top quality.
If it makes you feel any better, one of the content farms will eventually scrape it and it will rank for them.
|The idea of Google trying to determine that through an algo, and the guaranteed huge fallout from that (on non-spammy sites) should be enough to make anyone scared. |
Ahhhhhh! it all makes sense now. I wondered why my image driven front pages had been de-indexed! Nice one Google. So now I HAVE TO WRITE a junk piece of SEO nonsense on my front pages to stay in the index ...
Do I have to get the backlink equivalent of Facebook for my "thin content" pages to rank now?
Hopefully Google will treat eHow as a content farm too. I'm sick and tired of seeing how these guys and their "contributors" steal content from my websites.
|- Answers.com, About.com? - BT |
Both are atrociously bad sites. I hope Google knock these two out of their index.
Personally I disagree about Wikipedia, though, I feel it's fast becoming the only useful resource on the net, other than WW of course!
Google can’t tackle low quality content sites without wiping out their AdSense revenue. Dream on Google.
|If an individual can create 20 quality content pages a day (which is really cranking), it will take nearly 7 years to complete a 50k website. |
And hence, the monster raises it head (how does a content website get big enough to position well and profit in the serps and afford to do it?)
In my opinion, answering this is the root of the problem.
If one sincerely wished to deliver quality content to the masses of the internet and monetize it – good luck. This is particularity true for those whose revenue source is AdSense. Those nickels and dimes will not fund the costing of a writer(s), editor, and manager, building and other overhead drivers to run a quality content publishing business in this day and age.
What we see in many of these content farms is the business model being stuffed into the available revenue stream. Until the revenue improves and the right motivators are present there will be little reason for content to improve.
I’m not going to spend 60 hours researching, writing, editing and publish a QUALITY factorial and get paid a few dollars…
Demand Media is coming out with its huge IPO next week and it has attracted a lot of attention in the media due to its accounting practices among other things. This IPO by demand Media is said to the biggest by an internet company since the IPO of Google. Some conspiracy theorists are suggesting on the blogs etc. that this move by Google is timed perfectly to damage the IPO of Demand Media.
As on today, google SERPS are littered with websites having no content worth reading at all. One of my sites having about 30K pages with 7 lines of auto generated text on each page is still doing as well as it did in 2004. The so called content farms like ehow at least have a few lines of text written by a genuine human being on each page of their websites. I can't understand how a content farm like ehow can be worse than my auto generated site.
exactly iThink..I already see a lot of autogenerated pages with nicely designed "more" styled divs moving up, than pages with genuine human edited content!
i was earlier discussing about how certain sites with auto-generated search pages etc. were trying to rank for every keyword combination and now i see them moving to the top from their earlier 4, 5 or 6th positions!
If it's a large team effort, and everyone on the team can write unique quality content, then perhaps Google WILL see it as top quality. B
If I have access to the last 10 years of the Wall Street Journal or Time magazine and can put it all online with their permission, I'd say that's an example of unique quality content. There's content out there, some folks have developed access to it.
And you think Google's going to be able to tell that content like that is 'good' content from a content farm? I doubt it.
There's going to be huge fallout on this - and all sorts of false positives. The thing about content farms is the large volume of content - that's basically the definition. They get one false positive (like say my sites) and they just effectively denied users access to that quality information. And there won't be one false positive, there's be lots.
Worse, there's a very real likelihood that people won't be motivated to put large volumes of quantity content online. I just removed 5K of this type of content of my main site and it hasn't hurt my rankings. They may actually have improved. And I'll continue my current project. But beyond that if I'm not rewarded for my efforts with traffic, why would I or others bother putting up large volumes of unique content?
|If I have access to the last 10 years of the Wall Street Journal or Time magazine and can put it all online with their permission, I'd say that's an example of unique quality content |
So here's my followup: Is it still "unique" content if they created it originally AND it may be available to numerous other third parties? If it's only you and them, that's a great position to be in, but if it's you and them and 50 other websites, then is it still "unique"? Plus let's not forget that a bunch of people will be putting it online without their permission, further diluting the "uniqueness" aspect. So the point is, it's going to be difficult for an algo to tell where the originator stops and everyone else starts. You'd be in a better legal position than the scrapers since you do have permission, but to Google, everyone (permission or not) after the WSJ and Time may end up in the same boat. Yes that may be unfair, but don't be surprised if it goes that way.
I don't know what kind of searches some people do in order to say that Google is full of spam. I'm using Google Search a lot,from the beginning of it, but I can't say that I was dissatisfied of results. Of course, there are content farms but this is how life is. I don't think that users and website owners were really affected of this spammy sites.
Reno, what you've just said is that efforts to put online large volumes of content isn't worthwhile because the scrapers and dirtbags are going to steal it and Google's contributing to the problem.WHich BTW, is exactly what I said.
Saying 'it's not fair' doesn't brush off the concerns of the damage Google is likely inadvertently do with this change.
As for my specific content, I've already disclosed on this forum where it comes from (not time etc). It is unique, and it is high quality, better than most content on the web. And Google's just put itself in the position that unless I market the heck out of it, there's a good chance they're going devalue that. So why am I making that content available again? I very well may not, not if I'm getting penalties because I'm perceived as a content farm, which I am not.
Web spam is in the eye of the beholder. I know what my tolerance level is; I'm quite sure it's different from others'
>Wikipedia is cliquey but it's non profit.
Irrelevant. It is totally about money. The articles make the people they talk about money or cost them money. Just because the playground is neutral, doesn't give them a pass.
There are Wiki editors and authors for hire.
WP is much worse because
>Wikipedia is cliquey but it's non profit.
They raised almost $10 million dollars last year. I'd love to see Jimmy Wales' 1040 and his corp expense account from WP.
The problem with WP is it has an air of authority and Google endorses and reinforces that air.
95 links: [en.wikipedia.org...]
How many of those links were bought and paid for? Why should Google endorse that paid linking program when they go after competitors (text ad links) on the web?
That is but 1 page of MILLIONS. You want to talk content farm - atleast the large How to and self help content producers have editorial control.
wheel.... I'm not brushing it off, in fact, I'm agreeing with you that the algo changes may hurt you, and other legit sites as well. My point is this ~ even though the content you present is valuable, Google may not be able to determine the difference between your position and lots of others. If that happens ~ and it will not surprise me if that's the case ~ then the majority of the content you are getting (by permission) will not help you in the SERPs. What's important, I think, is to not let it hurt you, and to do that it's necessary to hand build a LOT of pages that totally originate with you. Everything you get from the other sources would simply be supporting your own "unique" pages, but you would understand from the gitgo that they would not necessarily have significant ranking advantages. IMO, that's the only way forward given what MC has recently said.
I wonder how this will impact ecommerce sites which may have some content related to the product and if that "brand" has several products a lot of the page content could be virtually the same with the exception of maybe color, size, product number, upc number etc..
I noticed a big shift in our traffic beginning tuesday the 18th. Just prior to this significant drop was a HUGE spike in traffic on the 17th and sales which is always odd for a monday.
It is the smaller niche sites which have the interesting content on them which I value, they have been squeezed out of the Index by all the big content scrapers with their big money and are hard to find nowadays.
Wiki, ehow and numerous answer sites, all low end junk written without passion or knowledge.
As usual google is focused in the wrong direction by their desire to overly promote the biggest volume of content.
We're all trying to figure out where Google is going. In my own mind, I'm trying to get it down to basics ~ what DO they want? As a business, they want/need to make a profit ~ the bigger the better. As a search engine, they need to deliver top quality and accurate SERPs. Without good SERPs, the profit will fall, so I'm convinced that is going to remain their top priority, not because it's "the right thing to do", but rather because it's the smart thing to do.
So then we ask ourselves, "how do they determine the best results?" That of course is the $64 million dollar question. We think we understand some of the things that appear to be very positive in their eyes:
 Original, well written text content;
 Depth, which is to say, more of #1 is better than less;
 Backlinks from other #1, 2 & 3 sites;
 All the other stuff, such as domain name relevance, fast page download, valid code, navigation hierarchy, etc.
So what don't they want?
 Duplicate content;
 MFA sites that mostly only deliver merchant products via database downloads;
 Useless/irrelevant backlinks;
 Keyword stuffed pages, or other cutesy tricks (hidden links, hidden keywords, etc);
 All the other stuff, such as outdated/broken image links, using hosting services which also handle notorious spam sites, using cheaps hosts so the site is offline more than 1-2% of the time, malware/spyware, etc.
I honestly believe that their goal is more of the first 5, and less of the second, but the devil is in the details, and what drives me nutz is how many websites do well that do NOT have hardly any of the first 5 criteria but DO have much of the second 5. So while it may very well be their goal to continually improve so the sites we see above us genuinely deserve to be there, in the meantime, many of us are suffering as Google tries to get it right. And thus, we find ourselves dropping down (or out) while other less worthy sites take our place. That bugs the h#ll of me, but there's nothing I can do about it except to try my best to focus on the before mentioned criteria, and HOPE that sooner rather than later, justice & fairness will prevail. But then again, I've always had a naive streak in me, so I'd better have a plan B in place...
I never believed that Google DID not take spam serious who would think that, of cause they do, what I would like to see is less of these domain if sites, they are everywhere.
|there's nothing I can do about it except to try my best to focus on the before mentioned criteria, and HOPE that sooner rather than later, justice & fairness will prevail |
Couldn't agree more...
"Content Farms" are just another way to say MFA whether the site is well known or not, large or small. G should START with the large ones which probably account for more then 90% out there. People who operate and continue to launch these types of sites should have there Adsense accounts disabled and be removede from G for good. No money from G might start to discourage their very existence and force these "webmasters" to create something useful.
|No money from G might start to discourage their very existence and force these "webmasters" to create something useful. |
What, now you're the internet police?
One person's trash is another persons content. Don't get too high on yourself that anyone one person or entity should be deciding what's trash and what isn't.
If someone's feeding their family off of $10 a day on adsense, who are we to take the moral high ground on how they make their living.
If you don't like the site, move on.
Everyone seems to be latching on to Matt Cutt's use of "content farm" and then defining that in all sorts of different ways. I think it would be valuable to unpack the phrase starting with Matt's own, albeit vague, definition:
|As “pure webspam” has decreased over time, attention has shifted instead to “content farms,” which are sites with shallow or low-quality content |
In this case, the use of the word "farm" is intended to be a derogatory description of content that is produced at a massive scale. The term seems to rely on the notion that the highest quality things are created by craftspeople in small batches vs. a factory approach of massproducion.
Lets look at different types of textual content on the web that are all created at scale, starting with the shallowest and lowest quality and moving up the quality ladder:
1) Scraper sites. These are sites that automatically create pages about a particular topic by aggregating content that it finds from around the web.
2) Sites with algorithmically-generated content that does add some value. There are many sites that aggregate info in useful ways.... The about-us type pages that aggregate everything possible about a domain can be marginally useful. A site that aggregates all social-media discussion about a particular topic/brand can be convenient too.
3) Sites with thin human-created content. wiki.answers has many pages with one-word answers. Most content by sites in this category is not checked by an editor. Some pages are valuable, many are worthless. Often, pages with questions but without answers pollute the SERPs (although Google seems to have reduced this problem recently).
4) Sites that rewrite existing web content. There is a wide variety of repackaging - simplifying a complex wikipedia article into laymen's terms could be quite valuable but just rewriting it with synonyms or adds nothing.
5) Rev-share content sites. These are the sites that allow users to contribute and market content and they share in the revenue that the pages create. Many of the links to these pages are given by the content producers themselves because they have something to gain, so the incoming link equity is not necessarily a true "editorial vote" that vouches for the quality of the content.
6) Rev-share content sites _with_ editorial oversight. Some rev-share sites have editorial oversight and reject submissions that do not meet their guidelines. Since there is virtually no cost (other than brand devaluation) for them to publish something of questionable quality, the tendency of these types of sites is to reject only the most egregious submissions.
7) Sites that pay upfront: Much of what Demand Media does goes here. They have editors that review all submissions; they reject poor submissions and fire poor contributors.
8) Sites that pay upfront where writers work with dedicated editors. These sites address the potential problem of quality at scale by breaking everything down into small teams - so you can think of this as a bunch of small craft brewers making as much beer as budwiser.
9) Wikis. There are good arguments that suggest that wikis are much lower on the totem pole, but since they will never be labeled a "content farm" by google, it doesn't really matter where we put them.
10) Traditional media. NYTimes is a content farm if you use the term "content farm" literally - content produced at scale... Take a look at pictures of their massive newsroom - it looks just like what you would think of as a "content factory" or "content mill."
The key question then, is everything done at scale necessarily bad? You may disagree with the order of the list above, but there are scores of sites much more in danger than eHow.
One last note: sites have more than one type of content. Some may have many original articles, but then many algorithmically generate pages that also pollute the SERPs. eHow has topic pages that add little or no value - those could be at risk. Many people have accused eHow of mirroring the exact flow of other web content (see #4 above) - those pages are probably at risk as well.
I am a little surprised about the eruption that this story has caused. I think it is important to note that the two parties most discussing this issue have their own axes to grind: webmasters and traditional journalists.
Webmasters are upset when other sites rank above theirs. If eHow outranks you in the SERPs, then you are likely to join the other army of webmasters that are similarly outranked and hope that eHow somehow gets axed from the SERPs. Traditional journalists also hate eHow, but for a different reason... Demand Media's compesation model undercuts the high salaries that journalists used to make when newspapers were in their heyday.
I am not defending eHow or Demand - I just dont think any of their original articles are at the level of risk that many people have suggested. If it is the case that a big portion of their traffic goes to topics-pages or to content that is rewritten from other web sources, then they should probably be a little worried. There are still sites at the bottom of the quality-ladder that still appear in the SERPs - Google will likely start there and then start to move up. I am not even convinced that they will ever get to #3, although the web would probably be better if they did.
| This 102 message thread spans 4 pages: < < 102 ( 1  3 4 ) > > |