Forum Moderators: Robert Charlton & goodroi
John Mueller talks Panda and Penguin penalties on hangout-30Dec14
English Google Webmaster Central office-hours hangout
Streamed live on Dec 30, 2014
https://www.youtube.com/watch?v=Ba_qLBFlIe4&t=08m37s [youtube.com]
[edited by: Robert_Charlton at 9:36 am (utc) on Jan 2, 2015]
[edit reason] fixed YouTube url [/edit]
He said that maybe, he said he didn't look into the site specifically, that maybe, if there are a lot of low quality written comments compared to the quantity of content, than maybe that is bringing the site down?
...90% are from digging through the discussion forums and bringing out tidbits of information no one would have seen otherwise.
Is Google regarding the brevity of Barry's content as a shortcoming and discounting it, rather than rewarding it as a distillation? What about questions of originality, also discussed in the comments?Looks like a classic hammer flaw in Google's algo.(If all you've got is a hammer then everything seems like a nail.) The problem with using any kind of textual analysis on a body of text is that you need a decent body of text for the algorithm to be reasonably effective. Short and concise posts may not provide a sufficient body of text and may look spammy to a crudely developed algorithm that isn't modified for exceptions like limited text cases. Might be worth seeing if long posts rank/fare better than short posts on the affected sites as this might highlight or confirm any flaws in the algo. Thinking like a search engine developer, this is a bit of a mess when the parser cannot discriminate between site content and UGC due to a custom commenting system. The algo might be taking all text as the input.
more try and work out what this video teaches us.
So UGC can now bring down a page/site unless it is moderated and the lower quality UGC is deleted, awesome, now we know yet another way to ruin someone else's site.
...simply the ratio of useful content to fluff?
I think that's simply it. The entire page, including comments, is considered the content. So the fluff is overwhelming the signal.
Perhaps a schema.org solution
structured data markup like that doesn't significnatly change the quality of a page / site, so if you're looking to improve the quality, I'd start elsewhere.
John Mueller 5 hours ago
I'd just see this like any other kind of UGC you might have on a site. In the end, the webmaster is the one who publishes the content and provides the framework for it to be crawled, indexed, and shown to users. It's not a matter of saying "please ignore this part, I didn't write it myself" (the random visitor of your site wouldn't do that either), it's really more of a matter of making sure that the site overall is of high quality.
For most sites, completely turning off UGC just because there's some low-quality UGC out there seem a bit too much to me, just like you wouldn't remove all comments on a blog just because there's some comment spam getting through. UGC can provide a lot of value, and if there's a passionate community on a site, I'd try to find the right balance or split it in a useful way. Other sites work hard to get that kind of UGC :)
I think that's simply it. The entire page, including comments, is considered the content. So the fluff is overwhelming the signal. The hardass response from Google could be that the ball is in Barry's side of the court to moderate the comments or come up with a technical solution to block the comments from crawlers. Would be nice to be able to no-index/no-follow a portion of a page, heh.Ironic given that such a problem is really Google's doing in that its parsers cannot differentiate between content and UGC. With ordinary commenting systems, it would be a case of modifying the parsers. However there would be a scalability issue. The real problem is that custom commenting systems would require custom parsers. This is, I think, why the use of a separate URL for comments was being so strongly "suggested".
Oh! This is very interesting!I think that the Spanish webmaster talking about a comparison website mentioned making some content in a page non-indexable or similar. The separate comment URL approach is messy but if a site has been hit, then there's little to lose from a bit of experimentation.
The hardass response from Google could be that the ball is in Barry's side of the court to moderate the comments or come up with a technical solution to block the comments from crawlers. Would be nice to be able to no-index/no-follow a portion of a page, heh.
[edited by: Whitey at 11:59 pm (utc) on Jan 2, 2015]
There is a difference between a good book and a long book.
How does Google identify low-quality? It isn't possible to ignore something you can't see.That's the big problem with Google's approach. You only have to read what was recommended as being a "good" site to realise that Google has a very restricted and almost completely academically influenced approach to what is a "good" website. It completely differs the vast majority of real world websites but that doesn't stop Google trying to evaluate what is essentially a human response. This stunted academic view versus real world view issue is at the heart of the problem and one of the main reasons for Google's near complete failure on Social Media. It might be very good on content rich issues but it is absolutely pathetic on what are essentially human value judgements hence it has to have panels of "quality" raters.
There is a difference between a good book and a long book. Everything an algorithm considers must be quantifiable.
There's also a difference between a page that's 100% useful content and one that's 20% useful content