I tend to agree that possibly there is some inability for Google to accurately attribute the original source, and thus it mis-applies the Panda penalty.
I have two sites, different content, same topic. I personally wrote all of the content (so it must be similar in style, etc...).
The older site has always ranked well (for about a decade) and, because of its rankings, has been scraped to the extreme.
The younger site has never ranked all that well, and I anticipated not scraped as much.
The older site was hit hard by Panda II whereas the younger site has not been affected (and has benefited slightly) by Panda.
Just like everyone else, I'm cleaning up the damaged website in regards to Google's quality guidelines. But really, the site just isn't all that dirty.
I fear that it is the duplicated content aspect that is my problem. But how could I possibly afford to repair this? How, without some enlightenment from Google, could I dedicate the time and energy needed to rewrite a few hundred pages (without knowing that that was really the problem) vs putting the same effort into creating new content for the undamaged site.
I fear, at least from a cost-effective standpoint, that my ten-year-old site will have to be toast.
Yeah, tedster, I'm hearing it so often, I wonder if they 'dropped' their discovery date or something and had to 'go with the first crawled after' ... Maybe they had a 'glitch' and the original 'discovery date' of page(s) or site(s) got lost (or overwritten?) and they had to go with 'first crawled' to replace the lost information or 'first crawled since' inadvertently replaced the original information?
IDK, but I keep hearing this so much I wonder, because it really doesn't sound like something they would do purposely? I don't see why they would ... Think about how many people they would shut up and how much negative sentiment they would stop if it was not an issue ... The content in the results doesn't change, so they don't really lose anything there, imo, and they would put an end to a HUGE amount of negativity, so it would seem prudent and advantageous for them to do if they could ... And, maybe they think they are doing it right? If it's a duplicate and you don't know the original source, you don't know when you're getting it wrong...
|I tend to agree that possibly there is some inability for Google to accurately attribute the original source, and thus it mis-applies the Panda penalty. |
This is my opinion too. The pages that have been copied often are badly hit by Panda. One page that has been copied more than 10 times disappeared from the Google SERPs. Used to be on page one and nowhere to be found after Panda.
Side note: Bing can perfectly detect the original source and Google is no longer able to do this. Which one is the better search egine now?
There's a paradox here. Google's index is significantly deeper than Bing's. For many sites Bing has only 10%-15% of the URLs that Google has. It looks like more documents = more chaos.
tedster, if you have 3 blog links Google should not index your 150 million listing directory. It's that simple.
But most importantly: when in doubt do not penalize people for having pages stolen from (so they could monetized by Adsense). That's criminal, if not legally, at least morally. Add the fact that this is at least a 3 month penalty and you have a grave injustice. Does Google care? Maybe, on paper.
5years old site droped.. and sites who copied my content are on top :( :(
For UK I got badly on 6th May... 4/5 May Traffic was max... Feels like go offline
>>>tedster, if you have 3 blog links Google should not index your 150 million listing directory. It's that simple.
That's one confusing thing about Google not being able to tell who the source of something is. I have dozens upon dozens examples of pages which were heavily linked to by other, top tier sites. You'd think that would be a signal, but somehow Google still ranks the scraper with the adsense farm and my stolen content and no links to his page above mine. It just doesn't make sense.
My question is, are any sites who have NOT been Pandalized seeing this problem? I tend to think it's just a symptom of having a Panda penalty applied to your site. As in the Panda penalty (which isn't exactly a penalty but still acts like one) simply says globally... this site is worth less than almost anything, and it's so severe that the site is even worth less then the sites scraping it. It's not that Google can't tell who originated it, but in the case of Panda sites it wants to penalize them so badly it doesn't care.
it must be something to do with freshness, maybe. the orginal site is always going to be the oldest so they are straight away at a disadvantage.
i dont beleive that google cares who wrote what first anymore. why would they? they are not obliged to rank the writer's version first, and it must take a lot of processing to work out which one came first -- and its not even an exact science anyway. the user certainly doesnt care if the writer comes first in the SERPs. they would much rather have the better site ranking first, regardless of where the info came from. (although i realise that lots of scrappers who are comng top are rubbish, which confuses things.)
i was reading another thread on here about DMCAs. it seems that they are starting to brush those under the carpet too. it looks like the whole thing is going in the same direction ==> towards not worrying about who wrote what first
|I wonder if they 'dropped' their discovery date or something and had to 'go with the first crawled after' |
I don't believe so, for my widgets I am seeing a lot of forum board postings from 4/5/6 years ago ranking #1 with nonsensical answers or non-updated in years, dormant portal sites with simply one image and one keyword ruling now, period.
Believe this or not I have the only portal site in my industry regularly updated and I have lost 50% of traffic etc, the former #2 has completely disappeared, cr@p is now everywhere and considering that my specialised construction produts data has extremely important information, woe betide Google if something falls on someone's head owing to false or out-of-date information.
I personally have trademarked plus keyword domain name sites not ranking in G yet all the other SEs have them at #1, well, except Blekko:-)
Panda 2.1 with Google cached pages from 4/5th May seem to be my worst affected.
I have never used Bing and DDG as much as this last week, G may say their search volume is the same, why, because it's very difficult in some widget sectors to find anything therefore more pages are being searched, ergo no change in volume or maybe even an increase.
I don't play silly games and this is just downright stupid.
|i dont beleive that google cares who wrote what first anymore. why would they? |
...isn`t benefiting financialy from this, a federal offense and punishable by law?
Folks, we're losing the topic of the thread here - this is not an editorial opinions thread.
Is anyone seeing new changes to the SERPs? Google made 500 algo changes or so in the past year, so something must be going on.
A few things I'm seeing in my areas with 2.1 are some "drop dead" gorgeous sites . Basically the same old weak re-churned web content though. Its almost if Google has a "pretty factor" now.
Secondly I am seeing a noticeable decline/reduction in Adsense usage on sites past page one. In fact I wonder how quite a few are creating cash flow unless they're attempting to secure it or the strategy is to rank it before adding the Adsense. I've noticed a lot of that coming out of India in the past three months. Page one sites seem to be going untouched by anything regardless of looks, quality, or Adsense.
Thirdly, even though many article sites declined in the rankings the presence of the copied content from them seems to have increased. More and more of the sites using this type content seem to be stripping the links and obscuring the origins of it out which seemingly increases its rankings in Google.
I am noticing that some docstock /scribd sites are back up, at least on some searches. I searched for the "cityname city" and it's blog but a brochure from one of them was first. Temporary probably, everything is in move
I'm also seeing more one page sites than ever rank well because of linking to the alright well ranked mother site.
Overall when you get past the top two pages Panda is no real improvement. All its doing is serving up two pages of a "steered" search from sites Google has deemed worthy and a suggestion that is what you were really looking for to begin with.
|it looks like the whole thing is going in the same direction ==> towards not worrying about who wrote what first |
It is so true and so discouraging. It looks like they don't even know where the original content is.
One of my sites was pandalized and disappeared form first pages in Google search results.
Now, when I google some parts of my content, the scrapper site appears on the first page!
So f***ing discouraging!
I hear you, but there's only a little sliver of traffic driven by page three on back, so having higher quality for the first two pages of the SERPs would have to be classed as a win, wouldn't it?
I've got to say, I'm finding fewer frustrating results this week - but I haven't been bothering to check for scrapers in every case. I do know that as a search user, I'm finding the information that I was looking more often. I'm particularly happy not to find supposedly technical forums that have all my query terms but no real information. Those used to be a plague (lots of scraping and mash-ups just to get a text-match for the query terms).
|Those used to be a plague |
Those were probably heavily blocked by Chrome users who are computer literate and used by Google for their naughty list.
It really comes to down to what you think the role of a search engine should be as compared to their goals, especially when any engine has gained such dominance. Plus we're training people to be lazier. Seemingly in the past year more and more Google employees seem to be espousing that old Technocrat babble. I'm still reminded of of the 90 year woman who responded to a crowd she certainly was familar with Google. She replied, "They're the people who want to rule the world." You'll find their concepts are similar to that of Alexander the Great.
I'm seeing more and more irrelevant page from large, corporate networks at the top of searches. They aren't spam sites, but they're sites owned by big brands and even though they often have nothing to do with the keyword phrase, they take up the entire front page and I have to go to the second page to find actual, relevant content from non-brand sites.
have you guys checked the last cached version of your homepage today? I'm seeing a lot of blogs, pandalized and not pandalazied, that have the last cached version of their homepage on May 9 / May 10. In fact, every single blog i checked so far today is in this situation. some of them are huge (owned by AOL, like engadget).
yesterday, everyone of them had their homepage cached on May 12.
IMO there's going to be another update this weekend, and very possibly a huge one.
I'm seeing something else : since 4 or 5 days, crawl rate has been steadily decreasing on my website (in France). Yesterday, googlebot has crawled nearly 1/3rd of what it usually crawls.
Anyone else seeing this?
zerillos, my pandalized site that has daily cache has a May 9 cache, drudge 10, Huff post 9, Techcrunch 10. Probably Google moving data around, there's no way Google pandalizes TechCrunch or Drudge.
As for update, the SERPS aren't 'normal' as far as I can tell so maybe.
On edit: [url]https://webcache.googleusercontent.com/search?sourceid=navclient-ff&ie=UTF-8&q=cache%3Ahttp%3A%2F%2Fwww.google.com%2F[/url] May 10th and so does MSNBC
i'm going to speculate that all website in google index are like this, since i can't find one that isn't.
|Probably Google moving data around, there's no way Google pandalizes TechCrunch or Drudge. |
i'm not saying they're going to pandalize them. i'm just saying they could be preparing for a new update.
And something else that is strange : since yesterday, the "links to your site" section in google webmaster tools is empty. The day before it was still populated with links...
Anyone else sees this? Could all these changes announce a coming update ?
I agree. Big changes on the site: operator today versus yesterday. Mine reduced by over 10,000 (low rent) pages I have opted to remove from the index.
i noticed the exact changes on the site: operator since yesterday. but they came up sporadic in searches (probably due to different datacenters). today every "site:" search shows the same thing.
Yes. site: shows same results on all datacenters for me too. Previously I had index page of my site removed from the first place when I use site: and was logged in Google account and now its removed when searching both logged in and out.
OK, so there's been a big dump out of the index? That should shake things up a bit!
Struck me the other day that with pro webmasters dumping so many pages (I dropped 20,000 and I'm a insignificant dot on the search landscape), the index must be shrinking and the link graph a mess!
Anyone else got problems with https versions of pages suddenly showing up and pages from banned (always have been) folders (like shopping cart pages)?
I wonder if googling is clearing some of the index a lot of cache dates are changed for a lot of my websites and some authority sites I track.
Worst part none of our properties have recovered any rankings.
On the Cache issue, I checked my cache on the 7th and it was dated the 6th, I checked it again on the 11th and the cache date was a day earlier, the 5th. I checked yesterday the 11th and the cache was the 11th.
just checked again, cache date the 9th....
[edited by: Yippee2 at 4:20 pm (utc) on May 13, 2011]