|Google's High Quality Site Guidelines - Retrospective|
Would like to do a little retrospective on last year's article "More guidance on building high quality sites" by Amit Singhal [googlewebmastercentral.blogspot.com ] in which he answered the Panda questions "What counts as a high quality site?" and "What can you do?".
Now that a year's gone by since publication of that article, would you agree that the points are an accurate reflection of what Panda is all about? Or do you believe this was just a smokescreen for something much different?
And for those that recovered, did you recover by using Amit's guidance?
I do think this list is what Google wanted to target with Panda - and the essence of how they formed the "seed set" for the machine learning that built the algorithm.
In addition, it seems clear that some sites got caught in a false positive situation - especially where Google has trouble assigning original author credit in duplicate/scraper site situations. This area is a perennial problem for Google and also honest websites.
It seems like Panda really came down hard on (cross domain) duplicate content in many cases, even though the total score certainly was much more complex than just a couple of factors.
The way that Google has integrated Wikipedia and mooted some kind of "Google merchant" thing, showed that the post was close to some of Google's internal thinking. However the post was, and is, very much at odds with the state of the web. Most websites do not even have full metadata (page title, description, keywords etc). Textually some pages are very sparse in terms of data. Google's link based algorithm now seems rather fortuitous and the gradual shift away (or perhaps more precisely a gradual addition of many other algorithms and 'signals') from that has caused problems for Google. It may be that Google is suffering from the old GIGO (Garbage In - Garbage Out) effect on a massive scale.
If I had to guess, and it is only a guess, at what helped some sites recover then it might be more concise raw textual data (more on-topic text) and lower a reading level that helped. This is of course if it really is using machine learning. Some of the suggested or related queries are similar to Faceted Search ( [en.wikipedia.org...] ).
Duplicate sites are actually a tougher problem for search engine operators than scrapers because there still is a body of webmasters who use the same content on multiple domain names without 301 redirects so that these sites will appear as clones. The hard part (speaking from experience of running a country level search engine) is determining which is the main site. Unfortunately these webmasters rarely read Google's blogs or Webmasterworld.
I still think that the "Wikipedia with a shopping cart and safe shopping cert" post was more indicative of Google's thinking than the reality of the web. It was portraying the web as how Google needed it to be for its new algorithm approach. It wasn't so much a smokescreen as a warning to webmasters to conform. Those who did conform may have weathered the recent storms better than those who did not and relied purely on link structures.
|Does the site have duplicate, overlapping, or redundant articles on the same or similar topics with slightly different keyword variations? |
I believe many sites were hit hard by the effects of this one passage alone. Top serp results invariably mean passing a human sniff test and that's the most important quality metric you need to worry about, imo.
I recovered by making my content a little more bland but descriptive.
(is there a language issue where someone rating an english site isn't english as a first language? It felt like it at times judging by what passed me)
I can believe the list is what was intended, but here's what always bothered me about Panda: it seems only directed toward information sites. I mean, how do most of the things on the list apply to:
--Product ordering pages, where you have all the info your shoppers would want, but this might not be obvious to a quality rater or algo that doesn't know your field?
--Opinion/commentary pages, where people are analyzing/discussing events? How does the algo know whether these pages are satisfying an audience or not? What if the site is pandering to a small group, like a very specific type of Libertarian? Can Google tell the difference between a top quality site that's loved by its tiny target audience and a mediocre site that's failing to please much of its broader target audience?
--Cross-posting, also popular with topics like politics or activism, where people are trying to get the word out by posting an article all over the web? It actually seems like in some cases Google ranks lots of cross-posts highly, as if they get what's going on, but this has always worried me. It's a legitimate practice, not an SEO tactic. These people have never heard of SEO most of them.
--Recipe sites and other non-copyrightable content. Google doesn't seem to mind when database recipe sites scrape each other - it lets them all rank well. Meanwhile, someone who's taken that recipe, tweaked it slightly, added commentary and added pictures of how it looked after they made it will generally not rank as high. I don't get how that fits with these guidelines.
The bit that annoys me about what Amit wrote is this
|Is this article written by an expert or enthusiast who knows the topic well, or is it more shallow in nature? |
I find if the articles get to technical or "expert" they actually do worse. I find if an article isn't ranking well I can go back and edit it (DUMB IT DOWN) and the next thing you know Google is loving it much more than before. Power tip for all those bloggers out there. Don't write for experts!
Quite frankly a lot of what he says in that article is drivel! it is basically the same old stuff Google has been spouting for years.
|"ninety percent of everything is crap" |
-- Sturgeon's Law
The problem with Amit is that he is talking about B, but the actual subject is A. G needs high quality SERPs, but they are talking about high quality sites. G needs to assert which content is original and which not, but they can't. While Google play with massive webmaster's work, black hat guys --still-- play the Google Game.
¿How do you know if recovery is because you, or because another algo change or new "signal"?
No offense. Google is still the best. But imo more honesty would yield better results. ¿What if Sturgeon 's Law is true?
|especially where Google has trouble assigning original author credit in duplicate/scraper site situations |
I'm not as technical oriented as most folks, but can't they discover that by some kind of time stamp? They must have some kind of gizmo that shows when a particular page was first crawled?
Maybe that's too obvious and simple...possible?
Timestamps are not reliable. It would depend on Google being notified and crawling the orignal author first. The other issue is that they can easily be faked.
|Or do you believe this was just a smokescreen for something much different? |
Yep, they've lost control, they haven't a clue what they're doing, everytime they say they're doing "something" or after "something" it's an admittance of "Here we go again trying to sort things out since the last one didn't work".
We've all seen it, quality sites trashed across the entire spectrum of subjects with scrapers, extremely thin and blatant keyword stuffers rising to the top.
I feel they "know" what quality is, and there is a huge amount of it, they simply do not know these days how to separate it from all the garbage other than forcing us all into Google+ and trying to get a handle of what many of us have created.
Now that would start a few arguements amongst everyone and just how the hell would one prove it!
As it pertains to the guidance provided by Amit, let's say an E-commerce site sells product categories A,B,C,D and E, and in each of those categories there are 50 items. There are similar types of products within a category but not between categories. The products within a category are similar in nature so there will be some common themes between them.
Is this type of site, provided it's not a major brand, asking for Panda trouble due to not concentrating on one category? Does Panda think you can only be good at one category or "expert" on one topic?
|Is this type of site, provided it's not a major brand, asking for Panda trouble due to not concentrating on one category? Does Panda think you can only be good at one category or "expert" on one topic? |
I don't know that Panda thinks that, but I do wonder if the algo looks for specific niches, and sometimes has trouble classifying sites that span niches (as it defines them). I have one site where I believe this is the case, and the algo's response - long before Panda - was to ignore all but one category of the site. And that was hardly my best quality category, so I don't know why Google liked it.
So yeah, I think sometimes it doesn't do your rankings any favors. But what's the solution? Google seems to want little niche sites that are just ripe for Adsense, and hey, no shock there. But that's not always what users want, so you have to decide which way the bigger money lies: Google, or loyal users.
|I have one site where I believe this is the case, and the algo's response - long before Panda - was to ignore all but one category of the site. |
Was this site affected by Panda and did this hold true after?
Strangely, that site was affected by Penguin rather than Panda. It wasn't penalized (I got the "no manual spam action" response to reinclusion), but on April 25th I saw my rankings plummet on several keyphrases from that category Google used to love.