| This 238 message thread spans 8 pages: < < 238 ( 1 2 3 4 5  7 8 ) > > || |
|Why Haven't Sites Come Back from Panda? Matt Cutts Tries to Explain|
This is a rush(?) transcript from Dany Sullivan's blog so probably not everything is 100% correct. The italics and bolding are mine.
|DS: Talking about Panda, says that heís getting a ton of emails from people who say that scraper sites are now outranking them after Panda. |
MC: A guy on my team working on that issue. A change has been approved that should help with that issue. Weíre continuing to iterate on Panda. The algorithm change originated in search quality, not the web spam team.
DS: Has it changed enough that some people have recovered? Or is it too soon?
MC: The general rule is to push stuff out and then find additional signals to help differentiate on the spectrum. We havenít done any pushes that would directly pull things back. We have recomputed data that might have impacted some sites. Thereís one change that might affect sites and pull things back.
DS: You guys made this post with 22 questions, but it sounds like youíre saying even if youíve done that, it wouldnít have helped yet?
MC: It could help as we recompute data. Matt goes on to say that Panda 2.2 has been approved but hasnít rolled out yet.
DS: Reads an audience question Ė is site usability being considered as more of a factor?
MC: Panda isnít directly targeted at usability, but itís a key part of making a site that people like. Pay attention to it because itís a good practice, not because Google says so.
Matt mentions 'pull back' but that's nonsense and very disingenuous of him. Pull back to me means letting a previously labeled bad content rank. We're talking about improved sites and content, no need to pull back, just reanalyze it.
So it's clear to me that this is a penalty. Maybe if you got links from every newspaper in the Northern Hemisphere you might escape but for the rest it looks like it depends on Google engineers. It took them 3+ months to admit it.
danny: if its always worked in the past then its tempting to leave it the way it is. but maybe that is not a very good way of looking at it. all sites can be improved.
there are some things that you could easily add to "pretty" it up a bit. at the very least, why not add images of the book covers? that would not be a sop to panda -- because that would actually be useful to the readers.
you can pull the images straight from amazon using their developer tools. that way you wouldn't even have to host the images.
Danny, I think you might have been affected by the text that you quote from the books.Though you do add blockquotes around them, Panda has this problem of classifying documents as low quality or of low value when it finds even a few lines of text quoted from somewhere else. That has been my experience so far and it does look silly.
At the same time, I have also come across sites that continue to do well even if all they do is aggregation of content posted somewhere else.This might be because the algos are applied differently for different sites based on how the google engine understands and classifies them.
This algo is fine with certain things on certain sites but not on others.
|It does seem highly unlikely to me that Google's algorithm cares much about style |
I truly think it does now. I have had a fascinating view through my niche of how Panda works. My niche is an interesting one in regards to Panda because of the mix of sites that are present. You had everything from scapers to content farms to true enthusiasts to .edu and .gov sites.
For a long time before Panda, the space was pretty much dominated by content farms and a few lucky (read fumbled into SEO) enthusiasts. The content farms were not so bad as they really did fill a very needed space in my niche, until there got to be SO MANY regurgitating content farms. Right before Panda, the top 10 listings would consist of:
- the top 3 would be from the same site, rewriting their own info for 3 different keywords
- 4- 5 others would be other content farms rewriting the info from the top 3 for their own sites
- 1-3 would be lucky enthusiasts
- 1-2 (maybe) (normally low ranking) complex and high reading level documents from .eduís and .govís Ė which are difficult for the average consumer to understand
After Panda, the content farms were pretty much gone Ė but so were the lucky enthusiasts. The lucky enthusiasts really did have great info, but they also used comic sans font and animated or low resolution gifs. Ugly, ugly sites but really good info. The enthusiasts truly did it because they loved it. The info was solid and easy to read and understand, but because they were focused on the content rather than learning HTML, Photoshop and PHP, their sites were very 1995. Still, Google cleaned them out. One site like this, I know saw a 50% drop in traffic. All original content and not scraped (because letís face it, scraping is harder on a site that is built by hand manually page by page rather than by a database).
I have no doubt because of this that looks do play a role in Panda.
|hannamyluv wrote: |
(because letís face it, scraping is harder on a site that is built by hand manually page by page rather than by a database)
Let's consider that. What is it about manually coding each page on a site that would make it more difficult to scrape? I obviously don't know the specific sites you're referring to, but I'd say it could likely be a lack of consistent structure across the site.
That may be one way Google's determining overall "quality" of a site. That's something that would be independent of the actual content, though, so... *shrug*
Unstructured content can also be a sign of scraped content... Which is often pieced together from various sources around the web with no set structure, particularly when they do a bad job of scraping and end up taking some of the formatting with it.
|I have no doubt because of this that looks do play a role in Panda. |
It isn't the looks in the direct sense.But the thing about this algo is it doesn't judge the quality of the content based on how good or bad it is, but based on other signals.
There are some superficial things that they look at. For example, if the images are really good, the content around it might be considered to be good. Add to this the social signals, mention of a few things that sends signals of value being added like user reviews, "pros and cons" and so on.
|I thought I was immune to Google problems - 1200+ book reviews, been around forever (either 11 years or 17 depending how one counts it), simple clean design with Amazon links but not much else, unrequested backlinks from all over, nothing even slightly dodgy - but I've been hit now. Half my reviews have been pushed out of the index by random duplicates with no standing, the other half rank nowhere, and Google traffic is down by maybe 70% or more. |
Heads up: You may not get much sympathy from many that assume that your site is 'bad' simply because Google's algo said so.
You have a choice to make, add a new template after all these years, wait maybe 6-12 months to find out, or do nothing and hope that Google backtracks. Matt Cutts has said that websites should be like Apple products, so you may want to look at an iPhone or iPad before changing the template. I would change the design since a 70% of traffic drop is devastating.
Deleting some of the index pages was a smart move IMO, but who knows if that will do. Not even Google engineers have a clue in Google support forums, they might just give general guidelines available on Google.com.
For all it's worth, many have made changes and seen no real and sustainable increase in traffic.
Edit: Just a did [google.com...] and some of your reviews have hundreds and hundreds of citations from scholar.google.com. Damn!
[edited by: walkman at 5:51 pm (utc) on Jun 13, 2011]
|Not even Google engineers have a clue in Google support forums |
I think that is a very important observation, one which we should keep in mind as we spend endless hours trying to fix a problem which we cannot even define.
The current version of Google is apparently the most complex ever, with many hundreds of variables that react depending on how the variables around them react. The possibilities could be close to infinite. Perhaps it does not equal the nuclear launch codes, but it may be getting close. With such complexity comes uncertainty, and therefore:
Even the chief algo engineers may in fact not know for sure how Panda will unfold; if you suffered a severe fall, they may not know specifically why that happened, or what you should do to restore your previous rankings; and they may not know when or even if a site can come back.
IF that's the case, then I wish they'd just come and out and say "we don't know". But of course because they thoroughly embrace FUD, they will not say anything so honest & clearcut. And so, as walkman said, "many have made changes and seen no real and sustainable increase in traffic." With such extraordinary complexity in the ranking calculations, IMO that will almost certainly be the status quo from now on.
Well, at the risk of a public stoning (and jinxing myself), I believe I have "fixed the problem". I believe I know precisely why Google support cannot be specific about what's wrong, which is also precisely why Matt Cutts' core advice couldn't be more apt.
My main search terms are on the rise... up page 1 of the SERPS, having been banished at the start of April (I'm in UK and seem to remember it was April 11th I lost 70% of my 200k visitors/ month).
Improvement started in mid May and has been consistent (not flipping backward and forward, like some report). Getting back in with a shot has entailed some major decisions (decimations!) and a lot of hardwork; getting back close to the money spot, even more.
From my experiences (across multiple sites, Pandalized and not) I can only conclude this is not a penalty and nobody is being held down (except possibly by their own denial). What it is, of course, is a new string to the algo that's of the simplest and also cleverest origin. Quite cunning really and the guys at the Googleplex must be wetting themselves watching webmasters discuss every possible variable, without seeing the bigger picture ;-)
|Not even Google engineers have a clue in Google support forums |
I totally agree and also Reno's observations, I wrote something very similar in the middle of the night a couple of weeks ago after some thought-provoking beer.
For those sites with basically nothing wrong with them it's very difficult not to do anything but trying anything for the sake of it could make things even worse.
Not even Google engineers have a clue in Google support forums
Given an algorithm that complex, it's not impossible that there is no one at all at Google who has a good understanding of the whole thing. I doubt most people at Google would know much about it at all, any more than most NASA engineers would understand space shuttle avionics.
Just a did [google.com...] and some of your reviews have hundreds and hundreds of citations from scholar.google.com. Damn!
Google Scholar doesn't seem to have implemented Panda-style negative screening (presumably because it's a giant whitelist anyway so it doesn't have spam problems). I'm currently getting more traffic from scholar.google.com than from google.com.
suggy - It's important to note that, in my experience with Panda so far, Google.co.uk is completely different to Google.com and several other tlds.
I have pages climbing in .co.uk and I'm pretty sure I know why, .com is a Peperami - "A bit of an animal":-)
Your mileage may vary.
In the original post Walkman wrote: it's clear to me that this is a penalty
I call it "negative screening" instead of "a penalty", but yes I think this is clear.
I have just blogged about this at length, but it seems to me that what Google has done is to build, using human feedback and employee appraisal, a large corpus of spam - that is, of junk or near-junk pages that rank highly on searches. They have then fed this spam corpus to a machine learning system, and applied the resulting filter across their entire index.
The problem is that the machine learning system can't evaluate lack of quality or junkness directly, so it's using measurable features of web sites and pages that correlate with those. And this is where the false positives come in.
Possibly I have just been unlucky, but quite possibly sites like mine have been actively used as models by spammers, who have copied all the features that can be easily copied, just replacing the content with auto-generated gumph.
If you don't mind my asking, what wrongs/ issues did you target?
Nope - still don't get it.
I'm only interested in google.co.uk (pandalized site is a UK consumer e-commerce), so can't comment on US SERPS. Never really watch them. We cancel most orders originating from America because they are too often fraudulent.
Although, I can't say I have been through the data/ done the research for the US, I can't see any reason why Panda itself would be vastly different. After all, it wasn't it described as a roll out to the UK?!
You saying you have a different, rarer (and more fickle) breed of Panda over there?!
FWIW, I suspect not; it's the sample and the population that's different. All anomalies are a function thereof.
Suggy, we've down this path before ;)
I have way more than decimated my site starting on February 25th, every page is extremely well linked Home > category > product (two types of categories) and then related within each product. I have a decent PR, in fact now about 70% of all my pages are indexed daily by Google and a lot of work has gone into adding new content within the remaining pages. I can safely rule content out, especially when compared to those ranking way up.
I have another theory and I changed it but it will take a few days to see any changes. Matches my non-Pandalized sites.
For all we know Panda in non-US is a different breed. They said that the content farm problem in US is very different from UK and others so they look at different signals.
|In the original post Walkman wrote: it's clear to me that this is a penalty |
I call it "negative screening" instead of "a penalty", but yes I think this is clear.
I call it a penalty becuase, IMO, Google has not run the full algo yet since 2/24 so even if you fixed your content, you are still in Panda land. Yes, some disagree. Matt Cutts said that even if you fixed the site exactly like Google asked you to, you could come back in one of the supposed Panda updates. Could, to me, means "probably not but you might."
Walkman -- trouble is, I just can't stop myself trying to help you.
Do you search in the UK/ Use google.co.uk, out of interest? Can't say I use the US Google. But, I can tell you that e-how were every where (and I know why they still are to some extent) and ezinearticles, plus about, etc... Besides, looking at the Sistrix winners and loses, I don't see anything to disuade me from my current theory.
Ask yourself this then, why wasn't it released at the same time if they are identical? February 24 was Panda US and April 11th was Panda UK+.
And those caught on 2/24 are with the worst of the worst, sites that Google wants to hurt and make sure they don't escape easily.
I don't think Google wants to hurt anyone. Google just wants to return the best search results (in the opinion of searcher, not the webmaster/ owner). That's essential to their longterm survival (forget short-term profitability).
Think about it, since most 'content farms' were emblazoned in adsense, what financial benefit would Google have from attacking them? They were great for Google; they converted natural unpaying (for Google) listings into ones where often Google earned on the next click!
|You saying you have a different, rarer (and more fickle) breed of Panda over there?! |
I'm in the UK. I've used proxy servers many times to see what's happening with several search terms in the USA.
Frankly the USA SERPs make our UK ones look great in comparison, IMO. And ours are completely biased in favour of boring, thin 'brand' pages which are way ahead of quality and relevant sites. Just my opinion, of course but then I'm not a PhD.
|I don't think Google wants to hurt anyone |
Google are hurting a lot of people. Badly. I remember the bile that was directed towards MS in the past but it is nothing to the hatred that so many people who have lost so much, so quickly, are feeling for Google because of what this company have done. If they were a UK company they would have compensation claims stacked up to the ceiling because we all have a duty of care over here not to do anything which can damage other people's well being or livelihoods.
The standard answer of course is ...."Google can do what they want, you shouldn't build your business plan on free traffic" etc etc etc. Well European judges may disagree with that interpretation. Like the Mills of God they grind extremely slowly, but extremely fine.
|but quite possibly sites like mine have been actively used as models by spammers |
I have a Feb 24 pandalized site that has been very actively used by spammers as a model to create 1000's of spam sites.
I really don't know what's going on, I'm just saying that this did happen and I do have good information that is linked to and used by experts in my niche.
I wrote: I'm currently getting more traffic from scholar.google.com than from google.com
That's an exaggeration. But the numbers are in a similar range now, when once it would have made no sense at all to compare them.
|because we all have a duty of care over here not to do anything which can damage other people's well being or livelihoods |
Er, what? Since when? Wouldn't that rule out any form of competition in business or the job market. "Hey, you got my job... you damaged my livelihood"! "Hey, your advertising campaign poached our customers. You can't do that because you damaged my livelihood"...?
As Tedster said previously, Google is an organisation... they're running a business... they have business goals... they don't set targets about "how many people's livelihoods they can ruin". Besides, it's a zero sum game. Didn't someone else take your place in the SERPS?!
|it's a zero sum game. Didn't someone else take your place in the SERPS?! |
not any more its not. i've seen sites occupying 5 different places on the first page.
that's quite a common occurance now that "brands" seem to be getting a hefty boost
|Er, what? Since when? Wouldn't that rule out any form of competition in business or the job market. |
Hmmm...I would say superclown2 is referring to European Anti-Competitive Practices...Anti-competitive practices are best defined as strategies designed deliberately to limit the degree of competition inside a market. Such actions can be taken by one firm in isolation or a number of firms engaged in explicit or implicit collusion. Since 1998 there have been numerous investigations in industries such as chemicals, banks, pharmaceuticals, airlines, beer, and paper, plasterboard, food preservatives and computer games!
The Main Aims of Competition Policy
The aim of competition policy is promote competition; make markets work better and contribute towards increased efficiency and competitiveness of the UK economy within the European Union single market. Competition policy aims to ensure:
* Wider consumer choice in markets for goods and services.
* Technological innovation which promotes gains in dynamic efficiency.
* Effective price competition between suppliers.
* Investigating allegations of anti-competitive behaviour within markets which might have a negative effect on consumer welfare.
There are four pillars of competition policy in the UK and in the European Union:
* Antitrust & cartels: This involves the elimination of agreements which seek to restrict competition (e.g. price-fixing agreements, or cartels) and of abuses by firms who hold a dominant position in a market.
* Market liberalisation: Liberalisation involves introducing fresh competition in previously monopolistic sectors e.g. energy supply, telecommunications, air transport and postal services together with new arrangements for car retailers inside the single market.
* State aid control: Competition policy analyses examples of state aid measures by Member State governments to ensure that such measures do not artificially distort competition in the Single Market (e.g. the prohibition of a state grant designed to keep a loss-making firm in business even though it has no prospect of long-term recovery).
* Merger control: This involves the investigation of mergers and take-overs between firms (e.g. a merger between two large groups which would result in their dominating the market).
Guess what? I didn't write all that but knew where to find it:-)
And how does this relate to google changing the way it ranks it's search results?
Google's not a monopoly, just the searcher's favourite/ biggest brand.
Google's not a cartel; we're not accusing it of colluding with other search engine's to prevent new search engines are we?
Google's not being propped up the US state, is it?
We're not accusing Google of trying to merge with someone to dominate the market?
There's competition in the market (one I heard of called Bing is backed by a pretty big company itself), the EU isn't going to stride in and say "Google: you can't be everyone's favourite! Now, go turn some folks away so they have to choose someone else."
Time for some folks to come back down to earth...
Lets focus on the things we do know now that panda has been around for several months now and google has released some statements regarding panda. I know a lot of people don't want to but lets start taking what Matt Cutts says as fact.
- Panda is aimed at reducing the rankings for low quality websites. Basically the reason behind panda is to push down the websites whose only purpose is to put out a bunch of useless content and profit off it otherwise known as "content farms". Why do people put out content farms? To make money off them.
- Matt Cutts has just said media sites with no text such as flickr have nothing to do with panda. This is a very big clue. A lot of people have been assuming that panda was targeting lack of content but it appears the problem may be the actual content itself. What does this tell me? This tells me that panda is focused on hitting sites who only put text on pages for the sole purpose of having it rank.
- Panda has some bugs. Google is aware of the issue on scraper sites outranking the original source and they are in the process of correcting it. No big surprise here but it is good to receive some confirmation that they are aware of this issue and are going to correct it.
- The reason why most panda effected sites have not recovered is because google has not re ran its data yet. According to MC, there is going to be a new update coming out so hopefully we will start hearing about some sites recovering.
- Ads play a role in panda. I am going to put this down as a fact regardless of who disagrees with me. The ads themselves will not get you in panda but it is all about intention of your content. If you have ads on poor content pages, this throws a signal to google that you are trying to profit off your low quality content. I have seen 2 thin affiliate sites by the same owner come back to the top 10 after panda was released and both sites have all the affiliate links removed. This is a strong indication to me that google is now ranking these sites because they no longer have ads on it. You can feel free to call this a theory but I have seen enough data and done enough research to label this as a fact in my mind.
-- Couldn't agree more.
|Basically the reason behind panda is to push down the websites whose only purpose is to put out a bunch of useless content and profit off it otherwise known as "content farms". |
-- Couldn't agree more
|but it appears the problem may be the actual content itself. |
-- Hmmm, or would it be better to say... "panda is focused on hitting sites who only put text on pages for the sole purpose of having it rank and that users don't like or find useful." -- That last bit's important. Google are not going after a business model; they're trying to increase searcher satisfaction.
|panda is focused on hitting sites who only put text on pages for the sole purpose of having it rank. |
-- I would call them unforeseen consequences. It's not that Panda did it's job badly, but Google didn't foresee what the short-term outcome would be.
|Panda has some bugs. Google is aware of the issue on scraper sites outranking the original source and they are in the process of correcting it. |
Here I disagree. I think new data has been folded in several times.
|The reason why most panda effected sites have not recovered is because google has not re ran its data yet. |
-- Google's not trying to stop people profiting off low quality content. C'mon...think about it. Think back to Google's objective. You're almost there. Just one step removed....
|If you have ads on poor content pages, this throws a signal to google that you are trying to profit off your low quality content. |
I like that summary, brinked. It reminds me of someone I spoke with soon after the Adsense program was launched. Their big insight was "now I just build content that is "slippery" instead of "sticky". The lower the value on my page, the more likely I can get an ad click."
There have been many in that boat - for years. And getting paid by the impression makes it even worse. At least in print you needed to get people to buy/subscribe to your magazine full of ads.
Echos of Amit Singhal's opening remarks about Panda:
|Singhal: So we did Caffeine [a major update that improved Googleís indexing process] in late 2009. Our index grew so quickly, and we were just crawling at a much faster speed. When that happened, we basically got a lot of good fresh content, and some not so good. The problem had shifted from random gibberish, which the spam team had nicely taken care of, into somewhat more like written prose. But the content was shallow. |
I think that is absolutely straight talk.
| This 238 message thread spans 8 pages: < < 238 ( 1 2 3 4 5  7 8 ) > > |