|Analyze Panda Losers That Don't Fit The Mold|
So we've had two iterations of Panda now, and with each iteration has come a publish list of the biggest losers. We all know, if we're honest, that a lot of the losers on those lists deserved to lose and lost for obvious reasons.
The point of this thread is to pick out the sites from those lists which DO NOT fit that mold, sites which it's not obvious why they lost, and figure out why they were hit.
In doing so, maybe we'll understand why Panda has hit so many here who don't seem to deserve it either. Here's the list of sites to discuss, I suggest we take them one at a time and simply go down the list one at a time and each list reasons we think each site might have been Pandalized. Once we think we've come up for an explanation for that site, we check it off and move on to the next one:
>>>If you provide an RSS feed the site republishing it isn't scrapers, they're aggragators.
Please point to even one site on that list that provides a full RSS feed of their content.
Again, stop with these broad, sweeping statements which have no basis in reality.
>>Who's the original author, who syndicates, who scrapes, who cares?
The original author cares. And you should care, because if the original author isn't profiting from his work, he will no longer be able to publish, and there will no longer be any actual content left to scrape.
What a strange statement. Clearly you're a scraper.
>>The only way to have unique content is to defend unique content and not allowing it to be syndicated or scraped for that matter, both pretty easily accomplished for the most part.
Really? How? You act as though sites want their content to be stolen. Please.
Now let's actually stick to the topic.
EVERYONE PLEASE STICK TO THE TOPIC OF ACTUALLY DISCUSSING THESE SITES! Enough with the broad, useless, unsupported statements.
The biggest problem with scrapers is that they not only reproduce content without permission or payment, but they reproduce contribution without attribution either.
Its plaigarism, pure and simple, and they profit from plaigarism.
Almost every scraped article implicitly deceives readers because they are lead to believe that it is an original work when it is not.
|What a strange statement. Clearly you're a scraper. |
Can you keep to the topic without calling names?
Discuss the topic, not the people discussing the topic, keep it civil please.
I was discussing it from the point of view of an algorithm that is obviously flawed and can't figure out who owns the content, nor does anyone seem to care considering Google could easily provide a simple mechanism to identify ownership of content.
I suggested years ago that they use simple sitemap pings to their advantage, the first site to publish would obviously (hopefully) be the first site to identify the page to Google in a sitemap ping.
Oh well, easy idea, ignored, we got Panda instead.
|EVERYONE PLEASE STICK TO THE TOPIC OF ACTUALLY DISCUSSING THESE SITES! Enough with the broad, useless, unsupported statements. |
Talk about broad useless statements. It's not the just the sites, it's the SERPs you should be discussing. I looked at the sites, looked at how they appear in Google, quickly came to some obvious observations. Obviously those observations don't interest some but it's the list of sites showing up in the SERPs telling the story, not just the biggest winner or loser.
|Really? How? You act as though sites want their content to be stolen. Please. |
It's not hard to protect content, been teaching it for years, even presenting it PubCon, but I won't repeat the whole thing here. People that do many of the same things I do don't have all the scraped results you see all over the place and many survived Panda.
Those that don't implement stringent site security get scraped, so they either obviously do want their content stolen (until now) or are just clueless about how to go about stopping it. Some people even used to claim scrapers provided useful links to sites, until now.
My one site that was Pandalized wasn't scraped, I have UGC content and a lot of competitors where the users just keep posting the same UGC, so we all got hit.
I've made many changes, seen some minor fluctuations, but what's the point of sharing exactly what I think the big problem is based on actual research if it's "broad, useless, unsupported statements" which I've been documenting with changes for a couple of months?
And people wonder why the old school SEO making tons of money don't bother sharing tidbits on forums anymore.
I'll just lurk and watch 'em spin.
Don't blame you a bit, Bill.
(Personally, I'd be looking at the sites that didn't get hit, rather than the ones that did, but maybe that's just me)
It would be a loss to all of us if you really go into lurk mode, Bill.
Don't let one person's gripes shut you up, please.
Bill if you want to discuss those things go for it. In another thread. It's not what this thread is about.
Throwing around broad accusations and claiming special knowledge which you can't or won't back up doesn't help anyone.
|Throwing around broad accusations and claiming special knowledge which you can't or won't back up doesn't help anyone. |
Sorry, did neither,
If you consider RESEARCH, actual scientifically approaching a topic, documenting all observations. experiments and changes in the test subject, in this case Panda, SPECIAL KNOWLEDGE, then YES! I have a ton of it logged since Panda rolled out.
When I sit and do a couple of hours research into a topic you post I'm sure as heck not going to sit here and post every last query just to prove my point, I don't do other people's homework no matter how hard they bully, but I'll share a little.
Here's another quick example to show you what I'm talking about:
Popcrunch is #19 for their own original post.
Ranks about 80 for this one.
Popcrunch content appeared to be served by moreover.com in syndication, I'd post that link but I'd rather make broad accusations.
So on and so forth. Need more proof? Sheesh.
Possibly a simple case of syndicator often being outperformed by those syndicating the content.
However, they rank #2 for this new piece, and a few others, so do most Panda sites, but probably not for long:
What signals make some Pandalized sites content remain top 10 and some not, there's where I'm looking.
What's going on with PopCrunch could also be somewhat basic. When those articles first show up linked to the front of the site or 2nd level pages they seem to do well leeching from higher ranking pages. However, as the articles slide further down into the site, linked from lower PR to NO PR pages, they simply vanish from the index as those articles have no ranking, they slip too far and go POOF!
The only difference pre-Panda is many of the articles on PopCrunch probably would've held up in the index longer having initially been posted on high ranking high quality pages than in the post Panda environment.
Now ask yourself why doesn't the article itself get ranked and stay ranked for PopCrunch?
I think the original broad accusation is flawed which is "Analyze Panda Losers That Don't Fit The Mold" as they appear to fit the mold all too well assuming you have a good working definition of the mold.
>>I'm sure as heck not going to sit here and post every last query just to prove my point
Then don't post in this thread, because that's exactly what this thread is for. I mean, no one is making you. Apologies if you misunderstood the purpose.
>>I'd post that link but I'd rather make broad accusations than do your homework.
Again, then don't post in this thread. I'm not sure why you feel people are forcing you to do anything. If you don't want to post proof, then don't make the accusation. Anyone can say anything.
For my part it seems like your premise is that every site on this list is a syndicator and thus deserved to get destroyed by Panda.
If that's incorrect please let me know.
That premise is not supported by this list of sites.
We know for instance that daniweb.com does not syndicate it's content.
I don't think cinemablend.com or techradar.com or reghardware.com do either.
So that makes your premise flawed.
If your premise is that these sites are being penalized because they are scraped, that's a topic that's already been discussed which begs the question why aren't all the other thousands of sites which are also being scraped being Pandalized?
If your premise is that these sites are being scraped because they want to be scraped and thus deserve to be Pandalized, that's like saying a girl deserved to be raped because she wore a short skirt. That doesn't really make any sense at all and is a topic for a completely different thread anyway.
I'll shutup now. Tried to keep this thread on track for as long as I could. Someone else can have a go at it now. :) I still think there's a lot to be learned if we can stay away from these broad, unsupported, accusations and generalizations.
>>>I think the original broad accusation is flawed which is "Analyze Panda Losers That Don't Fit The Mold" as they appear to fit the mold all too well assuming you have a good working definition of the mold.
Again, prove it.
The mold, and being promoted by the media, is that all the sites being Pandalized are being Pandalized because they have inferior content.
That's not the case with these sites. We've all pretty much agreed on that.
Are you saying you believe that they DO have inferior content, or are you taking issue with what is meant by "fit the mold"? If so, again, that's a topic for a completely different thread.
BILL, you make a good point, some sites such as digitaltrends got hit and they have quality content but all/most of there content is syndicated, heck they dont even rank for their article titles anymore. Also I am not seeing any of the syndicated sites giving credit to digitaltrends.
I fully agree with what you're saying BILL, its the sites that are not syndicating and getting hit that we should be discussing.
|The mold, and being promoted by the media, is that all the sites being Pandalized are being Pandalized because they have inferior content. |
That's not the case with these sites. We've all pretty much agreed on that.
Then maybe you should start a new thread because continuing a thread based on a flawed premise is just a big honking waste of time.
Even if "we've pretty much agreed on that" has absolutely nothing to do with reality and what the algo is doing, and if you can't get past that fact then this dog of a thread simply won't hunt.
Regardless of the media definitions, in Panda it appears duplicate content is inferior content.
Why do visitors want to see the same stuff on 10, 20, 100 different sites?
We mentioned Sears earlier, they're duplicating and syndicating their content all over the place: [google.com...]
When you post the same stuff on sears.com, mysears.com, kenmore.com, k-mart.com, ad nauseum and export it to all the price comparison and review sites, which site now has the quality content?
Are they all now quality content?
Suddenly it's all run amok syndicated crap.
My point many posts ago was the algo doesn't appear to know quality, it appears to see quantity, and quantity is apparently deemed to be low quality since most people, in theory, would protect their copyright if it were quality content.
It all fits the mold, the mold just isn't that clever to sort good from bad, it appears to make some wide sweeping assumptions and everyone gets hit.
|its the sites that are not syndicating and getting hit that we should be discussing |
From my observations, many of them were unwilling syndicated via scrapers, aggragators, etc. and we're not talking the handful of copies, it's usually a bunch that seems to make a difference.
I'm not saying their aren't other factors at work as well as I believe I'm seeing a few other things going on, but I'm thinking unique content is a major issue.
It can be relabeled thin content, duplicate content, syndicated content, we're then making conclusions that the algo doesn't make as it's probably just comparing what it sees on one site vs 100 others and if it's not unique, whammy.
[edited by: incrediBILL at 12:24 am (utc) on Apr 19, 2011]
This is a serious issue that Google will need to address. Sites like Yahoo News use content from 3rd party sources like us, Reuters, AP and others. If Yahoo is showing up ahead of the sources in Google, then eventually we will stop syndicating our content. I don't think Google or others want that to happen.
I expect Google will fix things in regards to syndication very very soon.
Bill that post makes a LOT mores sense than your previous ones. I think I just haven't been understanding what you're saying at all.
Let me try to sum up, and you confirm whether I have your position right.
Your position is that you believe duplication is a major factor (whether it's willing or unwilling). Correct?
If so then I agree. That is something which is supported by these examples.
How much of a factor remains a big question for me though, because when we've looked at similar unaffected or even winner sites, many of them are duplicated (again willingly or unwillingly) just as much.
Actually maybe there's some evidence that duplication counting this heavily is a short term flaw in the algorithm. Since Digital Trends was able to contact Matt Cutts, explained their syndication to him, and as a result he rolled back the Panda changes for them.
You'd think that would be doubly true for sites which are being scraped against their will.
I have a media site that was hit, doesnt rely on textual content but the site is high quality. Many of the competitors have much less quality and no unique content at all and were not effected.
The site doesnt syndicate content, some sites have leeched its content but that shouldnt be a factor.
The one thing I did notice, were the user profile pages were very thin and had ads on them, so basically a lot of them were blank pages with just ads. I removed the ads and blocked the users directory in robots.txt so hopefully that makes a difference.
@Shatner "Since Digital Trends was able to contact Matt Cutts, explained their syndication to him, and as a result he rolled back the Panda changes for them."
What do you mean? DigitalTrend has recovered?
Yahoo news, and 6 other sites, still outranking them for their own content.
@Nano see the post by ianbell330, owner of DigitalTrends on Page 6. They emailed Matt Cutts, told him about their syndication, and completely recovered after Panda 1.0.
They were then hit all over again after Panda 2.0 and haven't recovered from that yet.
@nano, your example shows DigitalTrends as the #1 result for me.
Uncharted 3 wants you to love its multiplayer modes
Digitaltrends.com - 84 related articles
Uncharted 3 wants you to love its multiplayer modes - Yahoo! News
Uncharted 3 wants you to love its multiplayer modes
@incrediBill They are buried on the second page to me.
Maybe they managed to get hold of Matt Cutts and are in the midst of recovering again.
Good for them, but he doesn't seem to be responding to anyone that doesn't own a tech site, which is a shame.
Do you see at the top "News for Uncharted 3 wants you to love it's ..." ?
That's where I see it, #1 spot, some people gloss over that "News" listing.
incrediBILL, wow. Good point. A lot of people gloss over that' I'd rather be at #1 where yahoo is for that search. So it's clear what Panda is doing: ignoring everything and giving big sites the visitors all while claiming to be fair and 'supporting the ecosystem' (by screwing the original author). Way to go Google.
I never see any google news
You are seeing us listed under Google News, not organic results for "Uncharted 3 wants you to love its multiplayer modes" that's why it says 84 related articles under it.
Right. Google news is completely separate and different from Google Search. One has nothing to do with the other, even though their results do display on the same page.
% Of Articles Written At A Basic Reading Level (the lower the number the higher reading level their writing is at):
Now let's compare those same sites with other sites in their same field which were either winners or were not affected:
Not sure how much this means, but the results are pretty dramatic when you compare these loser sites with others in their field. Most of them seem to be written at a higher reading level.
[edited by: Shatner at 8:38 pm (utc) on Apr 19, 2011]
There have been some good Panda posts in WW but it does seem that there is no general consensus on what is behind it. No rhyme or reason seems to be the general consensus.
Possibly that is exactly what Google intended with this update. But why why would Google do such a thing that had no observable reason behind it.
I have two theories, if, and that's a huge if, the above is the case. The first is that Bing is doing great things in the US, even if the rest of the world is totally unimpressed by Bing. Maybe Google is inordinately worried about Bing Maybe Google has a belief that somehow Bing is generating their SERPS results by using Google techniques without that being proveable? What better way of stopping Bing doing that then by confusing the situation, maybe temporarily, with an algorithm that is very confused.
Another alternative possibility is that Google is sick and tired of sites spamming their SERPS. If I was Google I would be. Their response might be Panda. Put out a SERPS algorithm which is virtually impossible to understand. The spammers are going round and round in ever decreasing circles trying to work out what the rules are but bad news for them, the rules are not detectable. At the same time, you average non-spammers are continuing on as normal ignoring the situation.
Hi nomie, that's not really what this thread is for. This thread is only specifically discussing sites which don't fit the mold, outlined in the first post.
>> My point many posts ago was the algo doesn't appear to know quality, it appears to see quantity, and quantity is apparently deemed to be low quality since most people, in theory, would protect their copyright if it were quality content.
One article we ran recently was scraped >350 times and our site isn't even #1 in the SERPS anymore for that article. The article first appeared on G News. How on earth can a small operation afford to pay lawyers to go after that many scrapers probably not even located in the US? That's just for one article.
@ianbell330: I feel your pain. A ton of our original articles which were first crawled by news bot are being outranked by sleezy-looking scraper/MFAs.
I want to point out that Gizmodo (a major competitor to Engadget) has been hit very hard by the Google algorithm too.
@ianbell Really? Hadn't heard that they didn't make any of the lists.
I wonder how many sites that we just don't know about have been completely demolished by Panda.
The one thing really nobody is talking about is how none of the sites on the losers list that "don't fit the mold" are big corporately owned sites. They're all independent small businesses.
That is not a coincidence.
He mentions it in the video here too (along with how traffic dropped after the redesign)
A lot of legit sites were hit and just not coming forward.