| 8:07 pm on Feb 28, 2011 (gmt 0)|
@Reno Completely agree on the "low quality" issue. Google isn't going to put a large number of people on judging quality, despite the fact that they have "quality raters". They need to do it algorithmically. Favoring the originator of content is one step, especially to eliminate the scrapers. I'm not sure there are many signals they can use to judge quality of content outside of user behavior. "Low quality" essentially has to be equated with "content people don't want to consume". Even if that happens there will still be manipulation to encourage users to stay on a page, or click through to another page on the site, that won't speak to the actual value of the content on the page. I think that eventually Google can get to the point where they serve "good" content, but never the "best" content and it's still going to be up to the user to search and find the content that is best for them. I see the BBB as a very rough analogy. They try to provide a list of companies that are "good", but they are never 100% right, nor do they assure that your particular experience with the company will suit your needs. I think that's the best that Google will ever be able to do.
| 8:11 pm on Feb 28, 2011 (gmt 0)|
It does seem like QDF (query deserves freshness) can conflict with original attribution, especially when the content is re-spun just a bit.
| 8:12 pm on Feb 28, 2011 (gmt 0)|
|What happens if the offender site gets spidered before yours? |
Like I said, they're the originator and it would be up to the owner to file a DMCA complaint ... There's really not much else you can do that's better than discovery date that I can think of.
|However, wouldn't factors like authority, link profile, domain age matter? It just seems so contrarian to human behavior that a white hat site would suddenly scrape other people's work? |
You would think, but I think to be totally unbiased as an independent 3rd party just trying to show your visitors the answer(s) they're looking for you sort of have to go with discovery date to be fair about it.
And what if the discovery date of your content on other sites is earlier there than on a number of your pages? Then your site looks like the copier.
It's really a tough question to answer when you get in to all the 'what ifs', but personally, I think I would go with: discovery = origination.
And how do you deal with syndication?
It gets really 'interesting' there imo.
| 8:13 pm on Feb 28, 2011 (gmt 0)|
Hmmmm.. here's a thought. Lots of big blogs are hit. Mine was hit. I noticed something. Some of my commenters do leave spammy names that are keywords. Some unrelated to my niche. That's one more area I can clean out! Maybe the bot is reading this, measuring areas of "spam" and giving you a rate because of that.
Could some improved comment moderation help?
| 8:15 pm on Feb 28, 2011 (gmt 0)|
|It does seem like QDF (query deserves freshness) can conflict with original attribution, especially when the content is re-spun just a bit. |
Yeah tedster, and the syndication question is tough to answer, because if making copies is allowed, then don't you as a search engine want to send the visitor to the 'most popular' or 'freshest' version or try to determine 'the best destination' for the visitor?
How do you know if the copy is:
By the owner on multiple sites.
Except the 3rd, don't you sort of want to send them to the 'best' destination you can, not necessarily the originator's site?
I still think I'd go with origination of discovery to be 'fair and safe' but it makes the question extremely 'interesting' imo when you start really thinking through possibilities.
|Could some improved comment moderation help? |
You might be on to something there, and I doubt seriously if it would hurt.
[edited by: TheMadScientist at 8:20 pm (utc) on Feb 28, 2011]
| 8:20 pm on Feb 28, 2011 (gmt 0)|
Interesting list of those most affected by the update:
- article submission directories
- press release submission sites
- user generated question and answer sites
- how-to sites
- social publishing sites (e.g. Slideshare, Docstoc)
I'm surprised to see Entrepreneur.com in the list as their content both in their website and in the print magazine are always of good quality. AsktheBuilder also has loads of original content that I think are on target in terms of quality.
So looking at Entrepreneur.com, if I write an article on how to start a party planning business, for example, should I now make sure that the piece is 5,000 words long and not just 750 words?
Should every article be a white paper or research paper-type? Because if the article is short, Google may not deem it quality enough? Really, what does quality mean?
| 8:21 pm on Feb 28, 2011 (gmt 0)|
|Could some improved comment moderation help? |
I'd say if you don't have the resources to moderate your comments, turn them off. For most small blogs moderation is pretty easy, not so much for larger ones. Comments are so ripe for spam that I'd almost never turn them on unless you have some level of moderation. A large number of spammy looking outlinks could certainly be having an impact.
| 8:25 pm on Feb 28, 2011 (gmt 0)|
|that's not as easy as it sounds |
Your point is well taken TMS, however I continue to believe that there should be specific solutions to specific challenges. When I throw out that sort of idea it's with the admission that I've never built a search engine, so with that said, here's a follow-up:
* I create new content and want to be spidered first so I go into GWT and copy the text only (no code) into a "storage bin", where I am assigned a special meta tag (an extension of my verification meta tag); I then upload my webpage to my hosting account with that special tag on the page; then return to GWT and request "Spider Page Now".
Is that a lot of extra work for me? Yes, but I can prove that I was first; Is it extra storage for Google? Yes, but surely only a tiny fraction of what they give for free YT videos. Is it extra work for Google? Yes, but it may help solve a problem that we are told is "intense" (ie, "who was first?")
I'm just throwing out ideas here off the top of my head, and no doubt there's much better ways to accomplish the same. The point is, a solution that deals with a specific problem has it all over a shotgun approach that kills a bunch of bystanders.
| 8:29 pm on Feb 28, 2011 (gmt 0)|
How about I'll cut it down for you and you can start right now, making any DMCA complaint you need to file easier.
1.) Create an archives directory on your site and on your computer.
2.) Copy the original to both when you publish it so the file modified time is the publication time. (Then if you make an adjustment to the original in a week you still have proof of the original publication date.)
3.) Upload your page.
4.) Tweet a link to it ... (GBot will be by in seconds.)*
* Usually. And usually in the tests I've done Gbot hits the page before I can refresh my stats.
[edited by: TheMadScientist at 8:34 pm (utc) on Feb 28, 2011]
| 8:32 pm on Feb 28, 2011 (gmt 0)|
There's basically 2 types of content farms - those that need to pass editorial review and those where anything goes. Those that allow submission with little or no supervision have more crap. EHow tends to be better because of this - but they still have a lot of content that is usless or tends to be duplicated.
A real world sample of Ehow Articles (found under cattle):
'Highland Cow Facts'
'Highland Cattle Facts'
'About Highland Cattle'
'Information on Highland Cattle'
'How to Identify Highland Cattle'
A search for 'Highland Cow Facts' on Google will put two of those articles in the top 5 - there's absolutely no consolidation anywhere on this site.
I think EHow may have gotten a break because of the recent IPO - the backlash from small webmasters is bad enough - how bad would it be if you have 100 thousand shareholders and a ton of wall street lawyers thrown into the mix.
| 8:34 pm on Feb 28, 2011 (gmt 0)|
@alika I'll back you up on askthebuilder. I used to work in that industry and I know his content is of pretty decent quality. Askthebluilder has a large proportion of the links out from the pages as AdSense or unmasked affiliate program links. Possibly another signal Google may have changed the weight of in this update? If so it's another good reason to run your affiliate links through a redirect.
I'm not sure the length of the article matters. One site I used to work for has the majority of its content in the 500-1000 word range and saw no traffic drops in the update. Their content is well written and original in copy if not in actual information. They do, however, have few advertising links on the site.
Does anyone have good examples of sites with little advertising getting hit hard in the update? Could ratio of advertising based out links to non advertising out links be one of the components of the update?
| 8:35 pm on Feb 28, 2011 (gmt 0)|
Was any site that doesn't use adsense hit?
| 8:41 pm on Feb 28, 2011 (gmt 0)|
There's plenty of sites in the 300 list that dont use adsense at all. Just take a look.
| 8:47 pm on Feb 28, 2011 (gmt 0)|
Thanks TMS ~ as I said, there are no doubt many ways to implement a workable solution to this problem. Google just hasn't found it yet, or having found it, has rejected the more elegant solution in favor of the more complex one (which is typical).
Here's my larger point: If Google can fix the specific issue, then the larger issues may take care of themselves.
It's like this: If my garden is getting hit by beetles, then it's almost certainly better to set a beetle trap than it is to hit it with a firebomb.
To my eyes, the "Content Farm Update" is being dealt with by firebombing, when a sniper might do the job considerably better.
| 8:55 pm on Feb 28, 2011 (gmt 0)|
I think article farms tend to have huge numbers of pages, not low numbers of pages. Low numbers of pages (dozens to hundreds) are normally mon-n-pop sites.
| 8:57 pm on Feb 28, 2011 (gmt 0)|
I see two issues being bundled together in some of our discussions that are actually separate.
The quality of a piece of content, good or bad, is a separate issue from the justice issue that content thieves should not be given higher rankings than the creators for their own content.
Copied / spun content often goes hand in hand with low quality, so there's definitely some overlap between the concerns, but they are not synonymous.
| 8:57 pm on Feb 28, 2011 (gmt 0)|
I'm thinking of more than just Adsense but rather the number of advertising links as a whole, primarily PPC and Affiliate, especially as compared to do-followed, external, non-advertising links. Mahalo doesn't have a huge number of ad links per page, but they have a huge number of internal links, and few or no followed outgoing links. eHow probably has a similar number of ad links but far fewer internal links. Suite 101 has a similar number of internal links per page to eHow, but a larger number of ad links. Could the relative numbers of ad, internal, an non-ad external links be part of what's triggering the effects of the algo changes? I'm not sure I have an ideal site to experiment with right now but taking a site that dropped in the SERPs and reducing the ad links and internal site links and increasing the authoritative out links might be an interesting experiment.
| 8:59 pm on Feb 28, 2011 (gmt 0)|
|To my eyes, the "Content Farm Update" is being dealt with by firebombing, when a sniper might do the job considerably better. |
Yes, I can't disagree with your view too much for a fix today, but these guys think 5, 10, 15, or more years out and my guess is they think they have a way that will work long-term even if there's an 'adjustment period' to get things right. (Don't get me wrong I'm not trying to say it's perfect or things aren't absolutely s***ty for some people, but the 'big picture' looks a bit different, imo.)
One thing I think we miss only seeing today's picture is all the older threads like this where something was released and didn't quite roll as expected (or maybe as 'refined' as we would like is more to the point), but over time the results started to look better.
I think Caffeine's rollout was the last major thread like this where people thought they took three steps back, and maybe they did, but how does the change to Caffeine look when you think about instant and 5 years from now, from a only visitor and Google perspective?
There were some 'obstacles' to the initial rollout if I remember correctly, but I think moving forward and long-term it was something they had to do.
I didn't read too much on the Mayday (meaning not almost every post like I usually do) thread, and from what I read I remember there being quite a few drops, but I don't remember the heated complaints about the overall results that have gone along with some of the past updates...
Anyway, the point I'm making is, yeah it's bad (or 'unrefined') today, but to be successful they really need to look way ahead of today and accepting an 'adjustment period' is something I think they have to do in most cases.
[edited by: TheMadScientist at 9:04 pm (utc) on Feb 28, 2011]
| 8:59 pm on Feb 28, 2011 (gmt 0)|
I don't seem to have any "up" pages, and hadn't read of people who did, but I take your word for it. That said, it doesn't change the penalty scenario, since it's rolled out to everybody. In some cases, where Google is penalizing sites above you more than they are penalizing you, the sort order for a particular phrase would reverse.
I'm not looking at my pages on a phrase by phrase basis, I have no doubt that some phrases would rank better than previously. But the overall traffic to the pages is down around 35% on average, and will get worse when Google rolls this out overseas.
| 9:02 pm on Feb 28, 2011 (gmt 0)|
|Copied / spun content often goes hand in hand with low quality, so there's definitely some overlap between the concerns, but they are not synonymous. |
Definitely, and I think that's a good thing to remember and a great distinction to draw, especially with so much going on at one time.
| 9:03 pm on Feb 28, 2011 (gmt 0)|
[ in the irony of ironies, today Google has been email bombing my client and personal supersecret Google accounts with solicitations to join AdSense. Have gotten eight so far in the past hour. Making up for lost income? ]
| 9:34 pm on Feb 28, 2011 (gmt 0)|
|Does anyone have good examples of sites with little advertising getting hit hard in the update? |
|Was any site that doesn't use adsense hit? |
bahamas.com is on the Sistrix list of 300 major sites that were hit the hardest.
By one measure it dropped 61%; by the other measure it dropped 67%.
It is a perfect contra-example to some of the notions that have been bandied about for the past few days.
a. It isn't a huge site with hundreds of thousands of pages covering a vast array of topics.
b. It doesn't fit most people's notion of a content farm.
c. It doesn't run Adsense.
d. It isn't using UGC, and thus I assume most of its text consists of complete sentences, etc.
What's intriguing about this example is that it is a government-sponsored site which presumably did well in the SERPs because of its quasi-official "authority" status and associated backlink profile.
This example is consistent with the hypothesis that there are multiple factors involved in Google's attempt to measure "low quality."
My guess is that they ran into trouble because of the small amount of text on the average page, and/or the extent to which the same topic is discussed on numerous other sites.
Taking a quick look around, the site looks very nice, but many of the pages have only a small amount of text. Quite likely, there are many pages with similar information on sites like ezinearticles -- not to mention all the scrapers that probably used the site as a source.
Not sure why this particular site made the list -- perhaps the problem was too little unique information spread over too many (15,000+) pages?
| 10:00 pm on Feb 28, 2011 (gmt 0)|
Bahamas.com. Just cut and paste their copy and it's found everywhere. Dupes everywhere. Back to "who originated the content".
| 10:03 pm on Feb 28, 2011 (gmt 0)|
|Just cut and paste their copy and it's found everywhere. Dupes everywhere. |
So you are saying that there is no value in Wikipedia?They copy/remix data from several sources they are still on the top 1.
| 10:07 pm on Feb 28, 2011 (gmt 0)|
I think the point is that Google still has problems with attributing content to the original author. And it does look like this big update leans too much on the Scraper Update from a few weeks ago.
| 10:10 pm on Feb 28, 2011 (gmt 0)|
|They copy/remix data from several sources they are still on the top 1. |
I think there's a '12 hands touched it since it was copied' threshold or something like that to make it unique and not copied / plagiarized, isn't there? (lol)
| 10:36 pm on Feb 28, 2011 (gmt 0)|
It is so sad that Google is giving 1 liner pages as authority sites in the first page. Try this in Google < sorry, no specific searches >. The first 2 results are from same website with no content just a header. Really this is what the quality Google is talking about?
[edited by: tedster at 10:38 pm (utc) on Feb 28, 2011]
| 10:44 pm on Feb 28, 2011 (gmt 0)|
I absolutely agree that Google is having trouble attributing content to the original author. My wording it as a "duplicate content" penalty may not be technically correct - I'm not an SEO guy.
I should point out that I have one site that wasn't affected at all. The structure is identical to my two sites that got creamed, the percentage of pages that draw over 100 visitors a day from search is pretty similar, the only difference I'm aware of is it's been ripped off far less.
Two reasons for that. First, the subject matter would be viewed as less commercial by most people, though I doubt there's any difference in fact. Second, the site is only a few years old, so there aren't any long standing copies.
| 12:24 am on Mar 1, 2011 (gmt 0)|
|I absolutely agree that Google is having trouble attributing content to the original author. My wording it as a "duplicate content" penalty may not be technically correct - I'm not an SEO guy. |
Having scrapers outrank your site for copied content is often a symptom of other issues with your site and not necessarily the reason your site is ranking lower.
| 1:19 am on Mar 1, 2011 (gmt 0)|
Scrapers don't outrank my site for copied content, my site is simply lower overall. The majority of my traffic now is coming from long tail phrases, it's the shorter phrases for which Google used to send traffic because the sites were basically the most trusted in their specific areas that has vanished.
Nor do I think that scrapers cause duplicate content penalties. It's legitimate websites that copy content wholesale, often users in forums, and I gave up filing DMCA takedowns for these over a year ago. That may have been a mistake.
| 1:37 am on Mar 1, 2011 (gmt 0)|
|My guess is that they ran into trouble because of the small amount of text on the average page, and/or the extent to which the same topic is discussed on numerous other sites. |
I think you're unto something, Bahamas.com has a vendor section which is short on info. The same day this hit I removed my 'tags' section, that had more thin pages than the rest of the site. It was removed by the Webmastercentral on 2/25 so I am hoping.
| This 228 message thread spans 8 pages: < < 228 ( 1 2 3 4 5 6  8 ) > > |