|New Since Penguin: Are Brand Cheaters/Copiers Actually Helping Me?|
Basic info about my site: Informational, started in 2006, about 1,000 pages, hit by Panda 2.0 seeing a loss of approximately 1/3 google traffic (April, 2011), Not fully recovered, Not noticeably affected by later Panda updates, Not damaged by Penguin, Completely hands off for inbound links.
Since Penguin: My site has gradually seen an increase in the amount of google traffic with incremental improvements over time. For example, comparing year over year (2012 vs 2011) my improvements are as follows:
July so far: 120%
Why I think I'm not fully recovered from Panda: For a few reasons but basically I have many pages that aren't ranking or ranking as they used to (pre-Panda). I have many pages that rock in the SERPs though and am enjoying traffic counts for particular pages with lots of competition that I never realized were possible for that particular topic.
How I think Penguin affected me: Didn't ding my site but didn't juice it up either. The increase in traffic is because I've bumped up into spots that competitors above me were knocked out of. Site didn't generally leap frog over anyone, not ranking higher on its own merits IMHO.
Brand Cheater Definition: Copying a site's overall concept and article ideas.
A general overview of my site: First of its kind in its niche, within a year had a very aggressive "brand cheater", now there are a handful, dozens more that target just one or two categories. My site is a heavy, mainly unlinked source for some content farms but also some fairly mainstream and popular sites, you're probably a fan of at least one of them and would probably be surprised at what they're up to. Most don't link to my site as the source. Many have teams of writers that are just re-writing and re-wrapping what I, and others, have produced. For blogs operated by sole individuals, most are heavily involved in social media such as Twitter, Facebook, Pinterest and Stumbleupon and achieve great reach by re-wrapping my and others content. There are underground social networks and personal connections involved here too, but that's another story and unrelated to this thread.
What I'm seeing since Penguin:
I happened to notice one day a spike in google traffic to an article that was rewritten and published on a brand cheater site a couple days previously. Cool beans. But then it happened again to another article that was rewritten on another brand cheater site. And then again. And then again. None of them linked to my site as the source. I finally took the time to go through a few different brand cheater sites, picked out 18 different articles that they rewrote between them and published since May and reviewed my stats. Each and every one saw a significant jump in traffic from Google within 2 or 3 days of being published on the Brand Cheater site. The initial spikes never stuck but many did settle at higher traffic levels than before.
What seems to play a part: There's a sliding scale of juice. My assumptions and observations:
--How close in theme a brand cheater is to my site. Those that are very close, the traffic bump is higher. Those that are just focused on one particular category, still an increase but not as high.
--How authorative or popular the brand cheater is. The more strong the brand cheater is, the more juice my article gets.
--How popular the brand cheater's article was, if it hit the rounds on twitter, pinterest, facebook, yadayada, the higher the boost.
It's like the content is being validated in some way by others re-writing it. Validated in the sense that yes, this topic belongs on this site. And yes, the information is worthwhile. And if it's information that's well received, it's validated again.
The traffic bump I see is either an increase in ranking for the key term or expanded long tail or both.
Another thing that seems to play a part, say I wrote an article on how to paint a rock, a nonsense topic for the sake of discussion here. The gist of my article states that the best way to paint a rock is to use white latex paint. If a brand cheater writes:
--Paint the rock with melted black crayons, my article gets a boost of x factor.
If the brand cheater writes:
--Paint the rock with white latex paint (what I wrote), my article gets a boost factor of 2x.
To be clear: the re-written articles would all be considered "unique" or "original". They aren't copy and paste jobs.
Since it's all sh & giggles rewriting content for some of these bloggers, one will re-write then a buddy will re-write it again on their blog, linking to the first buddy as the source. I STILL SEE THE TRAFFIC BUMP.
What else I've noticed:
Not all spikes stick...the article on my site has to be in good shape. For my articles that google hasn't found all that tasty up till now (traffic is very low), the spike hits and returns to low levels, maybe a little higher but still low. However, I have tested cleaning up the page, adding more information or whatever I think might help and it then it does jump up in traffic within a short period, sometimes a week, sometimes a few weeks. Is it from the brand cheater boost? Or the cleanup I did? Or a combo of both? For the articles in good shape, the first initial spike doesn't stick, but traffic amount grows steady and stronger over time.
I did some research on my biggest competitors who are individual bloggers. Most don't seem to have been touched by Panda but they all appear to be hit by Penguin. I can't be sure about that, but the minimal competitive analysis I do for about a dozen keyword terms each show that they have been hit. Their backlink analysis doesn't look fishy but again, I can't be sure since what I can see is limited. What they all have in common: Maybe less than 20% of their entire site content could be considered truly fresh or truly unique. It's mainly a regurgitation of what's already on the web.
Could this be something new with Penguin? A new classification of duplicate content maybe? Or a new way of counting authority without links? Or maybe this isn't Penguin and something new?
Or has this always been but my site just now is allowed to benefit from it?
Where things fall apart: Google can't seem to determine very well that a scraper site is full of duplicate content. How could it possibly grasp that the gist of an article is "Paint the rock white" and then grasp that my site was first to write about it? This couldn't possibly apply to shopping/retail for obvious reasons.
Where it makes sense: Scrapers never, never, never have a single byte of original thought, concept or idea. It's simply a reorganizing of information that's already published. Could this be some brilliant way that Google has figured out to parse out regurgitation?
To put things in perspective, one article I wrote about 5 years ago had not a single article published on a website about it. It may have been briefly mentioned or touched on in forums or chat groups, but not an article. Today there are nearly 1,000,000 results for that topic. Another one where I was first has nearly 75,000 results. And yet another now has nearly 3.5 million results. With the barrier to entry so low now to publish on the web, and with growth on any given topic moving so fast, and so many sites bringing nothing new to the table, how on earth is Google going to pick out a top 10 for any given term?
What could be confirmation of all this: Anyone else notice the increased presence in the top 10 that ehow and about seem to be enjoying for the past couple months? Say what you will about them, and I've said a lot, they do have a significant number of topics that are being rewritten by bloggers.
Caveat: I'm not a professional SEO. I don't have charts and graphs and teams parsing code. This is just something exciting that I'm witnessing. My interpretation of what it all is could be completely off and coincidental. Each week my site is surging forward more and more (overall). It could be that I'm coming out of Panda. Or that Google's picking off my competion with one of their various updates. Or the surge is an indication of something truly ugly brewing beneath the surface (I saw some surge before I was hit with Panda 2.0). But that doesn't explain the near immediate spikes I'm seeing for individual articles that are being re-written.
Anyone else seeing this? Keep in mind this won't be obvious if your article topic wasn't fresh at the time of publishing, it would only be seen on truly fresh content topics.
It's good to see that you're ranking above most the the pages that are just regurgitations of your original articles. Unfortunaely, Google doen't always get it right and reward the original article. One obvious example is wikipedia, which nearly always ranks above the original article that provided the infomation for the wikipedia page.
Thanks for describing your situation so completely. It really helps us define what you are seeing - and it could be the sign of a rather subtle point.
|To be clear: the re-written articles would all be considered "unique" or "original". They aren't copy and paste jobs. |
This puts me in mind of a couple things. First, as far back as 3-4 years, there have been comments from various Google people that exact match is not necessary for content to be considered "duplicate", at least to the degree that there could be a rankings impact. Google has invested a lot of resources in developing sophisticated semantics processing - to that degree that even a few years back, content that was only 80% "duplicate" could still be tagged as such. Why they still get fooled by outright scraping is a mystery, but in each case it does seem to be relatively short term.
Second, in a recent interview with Eric Enge, Matt opened with a discussion of content that was relatively similar or derivative:
|While they're not duplicates they bring nothing new to the table. It's not that there's anything wrong with what these people have done, but they should not expect this type of content to rank. |
Google would seek to detect that there is no real differentiation between these results and show only one of them so we could offer users different types of sites in the other search results...
...if Jane is just churning out 500 words about a topic where she doesn't have any background, experience or expertise, a searcher might not be as interested in her opinion.
Matt seems to be pointing to an algorithm component that we don't have a name for - something they are doing that might well explain the results Tallon is describing above.
|Unfortunaely, Google doen't always get it right and reward the original article |
This had been going on for years, not just post penguin.
Google will rank the article higher depending on the page rank.
- You write an article on your site, wait for Google to index, then you publish on another site. If the destination site has higher page rank, they will outrank you regardless if you are the author.
- Someone steals your content and publishes it, they will show up on top of you because Google sees it as "fresh"
These scenarios don't always happen but I've seen it often enough.
The Knowledge Graph shows us that Google is collecting semantic data. We should not be surprised if they're trying to figure out who is authoritative based on the concepts presented on a site, rather than mere text. Google might be figuring out that your site is presenting the ideas first, and the others are echoing them with different words. Or maybe your site is authoritative due to the large number of concepts related to its field, while the copiers are picking away at smaller subsets.
Your traffic increases could be due to Google using the other site's activity as an indicator of interest in your topic, so the ranking of your article might be pulled upward along with the rewriter's material.
This is really fascinating, and would be great news for all of us who have had content rewritten this way.
This may offer some confirmation of your theory: it's the first explanation I've read that explains to my satisfaction why one of my sites got hit by Penguin. I have some unique content (and it's been rewritten by others, just like yours), but I also write about some very common topics in my niche. Even though I always try to add *something* unique when writing about a common topic, I can totally see (in hindsight) an algo thinking my site overall wasn't unique enough.
Thanks for posting this. Interesting read but I do have a few questions.
1) Is you blog focused on one particular niche say, for example, painting?
2) Are you the only author of the blog?
3) Borrowing "writing about a common topic" from diberry, don't you ever write about such common topics in your blog?
4) Are all articles on your site completely unique as you say in your example. i.e. does each article on your blog is the first to say something unique on the topic like "the best way to paint a rock is to use white latex paint"?
5) Do you provide any value add, other the article, on your blog which such brand cheaters don't have or reproduce?