homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

This 41 message thread spans 2 pages: < < 41 ( 1 [2]     
Content Farm Detection - What Might Google Do?

 5:51 pm on Jan 22, 2011 (gmt 0)

Can we speculate on what attributes Google might look for when seeking out content farms?

1) large volume of pages, maybe 10K+?
2) regular addition of pages?
3) directory-like nav structure? i.e. to distinguish this from busy blogs that I would think would have a more linear structure.
4) incontent links on most pages.
5) some measure of inbound links? (this one I'm concerned about personally) Though I don't have any idea what attribute of inbound links might point to a content farm.



 2:18 pm on Feb 4, 2011 (gmt 0)

Good point. Good bye Wikipedia!


 2:32 pm on Feb 4, 2011 (gmt 0)

My first thought is that traffic data, user data, should offer some good clues

I've been noticing a few sites in my niche starting to frame their outbound links. Would this count as increasing visitor time on site and pageviews per user? When looking at their numbers with competitive analysis tools, their pageview numbers have skyrocketed but how does google interpret it? This wouldn't be worthwhile for sites that aren't hubs or link out much.

I know nothing about framing external content (in terms of seo). Is it a good idea or no?


 2:38 pm on Feb 4, 2011 (gmt 0)

@Wheel, lol. There cannot be any better indicators for a content farm and if google really wanted to go against such farms, then they should bid adieu to wikipedia or whoever it is as long as their pages cannot stand out without those massive internal links.


 2:44 pm on Feb 4, 2011 (gmt 0)

Those framing external content are more worser than copycats.AFAIK, framers got naked a few months ago.Not sure whether everyone was affected.


 2:55 pm on Feb 4, 2011 (gmt 0)

Those framing external content are more worser than copycats

ITA, it's shiesty IMO and I'm starting to notice some websites include "no framing their content allowed" in their site terms...the problem though is if your competitor is going to trounce you in the search engines because he has 7 pageviews/user vs. your 2 or 3 and 5 mins on site vs your 2.5, framing content suddenly becomes an issue. Maybe. I don't know how google's handling it (is it kosher with them or no) so that's why I'm asking.


 3:00 pm on Feb 4, 2011 (gmt 0)

What bothers me for some of my web properties is that some of my event sites have long been scraped by news websites - mostly newspaper and television/radio sites. Then we go through the "so sorry, an intern did it without our knowledge" charade for a week before they take them down. While I do pretty well in Google, I don't have the "authority" some of these massive sites do. Even having it first (by months in some circumstances) isn't enough to overcome that. I can't imagine some kind of content farm algo would be able to deal with that.


 3:31 pm on Feb 4, 2011 (gmt 0)

i doubt google's comments about content farms will lead to a significant algo change within the next few months. i think this because it is very hard to come up with a way to combat content farms without too many false positives.

this thread mentions using bounce rate or internal links. both of these have flaws.

bounce rate cant be used by itself. i have worked on legitimate sites that have extremely high bounce rates. think about a page with a free calculator or similar tool. people will come that page get their answer and leave. high bounce rate does not mean bad quality.

link structure also has problems. content farms have many links but so do blogs, e-commerce and many more other legitimate sites.

we may see some tweaks that target the simpler sites with minimal content but sites with larger amounts of content on each page & incorporating some form of UGC are very hard to target without high collateral damage.

several people have commented on wikipedia. lets step back and just look at the numbers. wikipedia has over 200 million backlinks and many very high quality links. it has over 100 million pages, most have significant & unique content. wikipedia also has high type in traffic from students looking to copy their articles :). it is hard to algorithmically punish a site with so much content, usage & link power without impacting other sites.


 3:43 pm on Feb 4, 2011 (gmt 0)

No matter the frustrations I have with their accuracy, I don't see Wikipedia as a content farm - and I doubt Google would either. I also wouldn't see framing or scraping as a sign of a content farm either.

My idea of a content farm is original sentences written only to rank - with little to no real information. The model is not to serve the visitor - it's only to exploit them.

Maybe Google would algorithmic detection to create a list - but that list has to pass human inspection before action is taken. Content farms are a kind of big business, even if they do use inexpensive writers to generate their content. The business model needs scale.

What I really hope not to see is just an episode of saber rattling - hoping to somehow scare these companies out of doing what they do. Now that they've opened up the topic, being ineffective would be a very bad thing.


 4:32 pm on Feb 4, 2011 (gmt 0)

Content farms are a kind of big business, even if they do use inexpensive writers to generate their content

The real content farms don't just use manual writers but they do use programs to generate those farms.

I was going through the video that Ted has referred to in this post and this man Harry Shum suggests that the root of the problem is adsense and indirectly tells that google should stop that program.But is microsoft right in offering incentives to various third party software vendors and others for using their search bar?

He probably wants a standard that a search company can be an OS vendor or a browser vendor but not an ad company (because we aren't one)...Funny, isn't it?

He also goes on and adds that google should lay standards for the search industry. I would rather prefer them to first lay down standards for their browsers and toolbars. What these ought to do and ought not.Inform the users clearly and in bold on what all they do with their click data.

So, what is clear from this discussion is microsoft uses click data to discover pages and not just links or sitemaps.They don't care about whether it is garbage or nonsensical, but they discover and show them in the serps, if someone clicks through to those pages via their browsers.They probably won't display those confidential pages of your site in their SERPS, but they are almost certain to discover them. All the corporates who religiously stick on to windows and IE in their offices, should now be wary.They should seriously decide on whether to use IE or not.

A fallout of this controversy is that the world will now see more and more "search clickers" like the "ad clickers" back in 2002-2004.

[edited by: indyank at 5:23 pm (utc) on Feb 4, 2011]


 5:17 pm on Feb 4, 2011 (gmt 0)

The OP asks about "finding" content farms. The larger the farm the easier it is to find. The content published by the industrial scale content farmers should be easy to spot, since it sprawls across so many millions of pages, located on a relatively small number of domains that are owned and operated by the same handful of firms.

Now, a much more difficult problem is figuring out which specific documents are of high quality (assuming some are) and which ones aren't.

Ignoring all of these documents (or pushing them down the SERPs) isn't a viable strategy for a Search Engine, since there are some topics where the alternative documents they would otherwise display are of even lower quality.

The core problem is with judging quality (including "correctness" which is one aspect of quality). Perhaps that's why the SE's normally speak about attempting to rank documents based on "relevance." That's a lot easier than figuring out whether the document is a piece of highly relevant junk, for a gem of beautifully written prose, providing in-depth, up-to-date information that can be safely relied upon.


 5:20 pm on Feb 4, 2011 (gmt 0)

Perhaps that's why the SE's normally speak about attempting to rank documents based on "relevance." That's a lot easier than figuring out whether the document is a piece of highly relevant junk

Exactly - very well said. It's a huge challenge if Google takes it on for real. It changes something essential about the whole search engine paradigm. And that's why it's scary!

This 41 message thread spans 2 pages: < < 41 ( 1 [2]
Global Options:
 top home search open messages active posts  

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved