TheOptimizationIdiot - 5:28 pm on Mar 7, 2013 (gmt 0)
There are so many silly pages out there only because they link to brands.
What makes me wonder is that they say ( in this other threat: How ysearch works ) they test their new algos before letting them into the wild. Then why are so many nonsens pages on first result page?
Well, one thing I've heard them say before is when you're dealing with the numbers they are there's only a limited amount you can actually test internally, then you have to "run with it" and let "other factors" influence where things end up.
There are some really interesting things they deal with most of us don't think about, like language detection and "maybe using grammar and spelling someday", because when they try to detect language and 80% of a page is in English, but 20% is in Greek when you "run a grammar check" algorithmically, you get "grammar on that page sucks" even though it might be 100% correct in both languages.
So there's some "little details" they have to take into account to make sure they don't "throw the baby out" (too often anyway) and to do it they have to "err on the side of caution" then rely on other signals to "do the dirty work" when they're not 100% sure algorithmically.
The page you're talking about seems easy, and I think sometimes it's almost a case of in a "machine learning system" you might have to show it N cases of the "really bad" for it to learn "don't show any of those, with a few exceptions".
But even when it looks simple they still have to make sure they get it right, and if they "just threw out" pages with limited text, a login/account creation form and nothing else on them, they'd look like fools, cause Facebook and Twitter would both disappear, so I think what they have to do initially is leave all those type pages "in" and the let "behavior and other signals" push the bad ones out and the fastest place to get them "pushed out by behavior" is higher in the rankings.
I'm not 100% if that's why garbage shows so high sometimes right after an update, but I think it could be "when it's questionable" the algo "bumps it up a bit" to get the behavior signals and it either sticks or drops over a shorter period of time than it would on say page 30.
I guess another way of saying it is: By pushing "questionable" or "looks bad but not sure, so let's find out" up into the "higher interaction areas" of the results they can "clean the index out" faster than they could by leaving "questionable" or "looks bad but not sure, so let's find out" results on page 5 where's there's not enough interaction to "indicate good or bad" result for what could be a very long period of time.