|Google Patent Filed on 1st Day of Panda on Ranking Pages Based on Usage|
A good analysis on Google's patent filing, filed on the day of the first Panda update:
The patent itself is at:
It helps to better understand Google by keeping up to date on their patents. Even if they are not using this patent it helps to uncovers their thinking process.
I wonder how Google would deal with manipulation from certain mturk projects.
It is interesting how many people have made blind guesses about Panda without even reading this patent.
Interesting, pretty much as I expected, I'll re-read it later when I've fewer interuptions.
I think we have already discussed about usage metrics here, isn't it?
Great article/patent to read.
From the patent
|for each document: a frequency of visit value based on a number of times the document was visited during a time period; and a unique visit value based on a number of unique visitors to the document; determining, at the server, for each document, a usage score from the frequency of visit value and the unique visit value associated with the document; and determining, at the server, an organization for the documents based on the usage scores for the documents. |
My take is that frequency permeates this patent application and will impact...
- urls causing an immediate traffic spike followed by no further activity over x amount of time.
- urls that don't have much history
- urls that garnered negative reactions such as backpage activity from the serps.
|wherein the plurality of documents include at least one document visited by multiple distinct counted visitors during a time period; |
It's important to get traffic from other sources(including your own site), Google may not rank urls well if it has no traffic data for them. Chicken and the egg scenario.
[edited by: Sgt_Kickaxe at 5:48 pm (utc) on Jul 22, 2011]
|Proven performing pages are going to do well while new pages, or pages with little usage history, will struggle at least for a while until they gain some history. |
i think you are saying the opposite of what Bill had actually explained in the comments to that question. Usage stats of the nature explained in the patent (freq. of visits and unique visitors) will help newer pages on newer topics to compete on a level footing with the established sites.
[edited by: indyank at 5:49 pm (utc) on Jul 22, 2011]
|Usage stats will help newer pages on newer topics to comepte on a level footing with the established sites. |
New pages typically have no usage stats, yet.
i had edited my previous comment. Google will gather the info on freq. of visits and unique visitors from the time the page was born. They will be computing it regularly.
They spell it out. It's pretty much all there. It could explain some random puzzles:
Why Google's put so much emphasis on user behavior in their public relations with webmasters since Panda.
Why Google seems happy despite so many reports of the search results being garbage for certain types of longish tail queries (data gathering for previously filtered pages - lots and lots of data gathering; Google would have less confidence in their data on less trafficked pages).
Why "tweaks" to Panda do not result in permanent recovery. They're not tweaks, or at least just tweaks - they're new user data, gathered in the interim, applied, and then being gathered some more.
Why sites, big or medium-sized, with highly variable content, or content that looks highly variable insofar as the user experience goes, are suffering more. (Just our impression - not sure how accurate this one is.) This is the "content farm" part of Panda, applying the sitewide user data score to the rankings - the more variable, the more penalty-like. As y'all said, fuzzy stuff.
Why Google or Google experts have said confidently there will be no quick fixes.
Why Google reps have said this can be gamed but that it's not easily tweaked by webmasters. Leaving basically "gamed by the bad guys" - which may be why we've been hearing reports of Google's raised defenses against botnets popping up (people are encountering captcha from Google when they return too quickly from a search engine result).
It may very well be that Google's de-emphasized keywords and backlinks and engaged in a big, stepped-up sorting game with user metrics. Incidentally, a few days ago, my husband used this example to explain to me how this kind of sorting works.
Google says, "So what do you want?"
User says, "Give me the eight of spades."
Google says "Hmm, not sure what you mean, but I think I can guess," and shows a bunch of bewilderingly random cards. The user says, "None of those are really the eight of spades, but I'll take the eight of diamonds and the two of spades. They're close enough."
To the next user who asks for the eight of spades, Google, a bit smarter now, but still in testing mode, offers the eight of diamonds, the two of spades, and some random cards. This user says, "I'll take the seven of spades. That looks closer than all of these."
So now Google's weighted the eight of diamonds and the two of spades fairly highly, and the seven of spades even more highly. This goes on until Google's offering users all the spades and all the eights. From these, the choices are narrowed down until Google's confident it's got the 8 of spades.
He said it's a learning AI. It should sort everything and make excellent linkages between searches and search results - linkages that are not reliant on keywords for relevance or backlinks as artificial quality signals, but that boil quality and relevance down to the user. Unfortunately, it will take a long time, possibly years. (In the meantime, he just Yahoo!s it.)
It raises lots and lots of questions, and not everything fits, but it's a start.
One other note...it almost seems as though it's not the patent that's a piece of Panda, but Panda that's a piece of this patent. Hmmm...
But what i can say is this information is already used by google. I have experienced this back in 2008.
Take for example the recent firefox error that was discussed in one of the threads earlier.Being relatively a new version, there wasn't much info on that error in the first few days. There was just one or two sites which provided the answer.But in one week's time the web had several pages offering solutions for that error.Yet, the original two pages continued to rank higher.
There can only be two reasons for that.
1) Page history 2) usage stats.
|i think you are saying the opposite of what Bill had actually explained in the comments to that question. Usage stats of the nature explained in the patent (freq. of visits and unique visitors) will help newer pages on newer topics to compete on a level footing with the established sites. |
Newer sites with a gazillion all of the sudden links /mentions he said, assuming he read it right. But that's not new, new but popular sites have managed to go on top almost within a day for a long time. And we're seeing BRANDS on top.
About the exact day: it probably depended on the legal team. Doesn't mean panda doesn't use this or part of this but the date itself might be a coincidence. We also know that Google doesn't use all their patents.
Of note is that Google stopped support for the toolbar on Firefox5+, so whatever they were getting from them doesn't matter now. They may also be supporting Chrome but who knows since they may lose some searches from people searching directly from the toolbar.
The "experts" said Panda will not be easy since it didn't happen for a long time. During the first month everyone was full of optimism :). Google was don't try one thing, focus on users...blah blah...
|Why Google or Google experts have said confidently there will be no quick fixes. |
Or links during the time they were the only solution. It's a possibility, that's all I'm saying.
|There can only be two reasons for that. |
1) Page history 2) usage stats.
Lapizuli, I follow what you say that isn't dynamic. You classify the page once but it may have changed.
Just re-read this. Doesn't this mean we can buy traffic to rank higher? Even if they mean traffic is a partial boost, as long as it's used you can buy your way out this. And then you have a death spiral scenario. Does anyone have a link on where to buy traffic from ;)
|Under a usage information-based ranking approach, the pages might be ranked differently. |
Looking just at a raw visit frequency, the pages might be organized into the following order: first page (40 visits), second page (30 visits), and third page (4 visits).
If those raw visit frequency numbers are refined to filter out automated agents and to assign double weight to visits from Germany, the order of the pages might change to: second page (effectively 40 visits, since the 10 from Germany count double), first page (effectively 25 visits after filtering out the 15 visits from automated agents), and the third page (effectively 4 visits).
The usage data might be combined with either or both the IR scores and the link scores.
In theory, your paid ads from Google could help the overall relevancy of your page. Therefore, you could pay Google to increase your natural ranks based on the usage of the pages receiving the paid traffic.
Well, people spend money on TV ads to promote their product, they then go to the website directly or go to Google to look for it. Either way, nothing is free.
It can't be a coincidence that this patent was filed the day Panda launched -- but maybe it's a meta-game. Maybe the idea is to create confusion by filing a patent that actually has nothing to do with Panda (perhaps technology that they've already been using for some time without patenting).
its a catch-22 if you need traffic to get to page 1. because the only way to get traffic is to get to page 1.
the sites that google puts at the top will always have an advantage over the ones below it, because everyone will be visiting them.
surely google cant rank sites on traffic, when they provide the traffic themselves?
its like a chef handing out jam sandwiches, and then deciding which guy has the best jam sandwich
Londrum, yes, I think that's what the random stuff appearing on page 1 is all about...mixing things up. Have you noticed SERPs are a lot less static than they usually are? That's what we're seeing, anyway. Google has done this before (maybe when college students got out of school? So they could randomly test SERPs?) but I think it's going to be the modus operandi for a while if Panda indeed draws from this patent.
Devil's advocate here: Isn't this useless for frequently changing pages? It takes loads of times to get enough traffic to test a page and by the time you are done it may have changed. Remember, Google has to 'see' the traffic too. How is showing junk scrapers on top quite frequently help Google, since this is an ongoing eval apparently?
|Londrum, yes, I think that's what the random stuff appearing on page 1 is all about...mixing things up. Have you noticed SERPs are a lot less static than they usually are? That's what we're seeing, anyway. Google has done this before (maybe when college students got out of school? So they could randomly test SERPs?) but I think it's going to be the modus operandi for a while if Panda indeed draws from this patent. |
If this is what Google is doing, then sure, it's problematic for certain types of pages. But isn't that what we're seeing? Really big misses for certain classes of pages and sites, high accuracy with some verticals and query types, and raw, untamed results with others?
I don't know if it's absolutely useless, though - just tricky, time-consuming, indirect, and maybe ill-fated, as the task here is to focus on quality.
A single signal for quality is impossible with so many different kinds of web properties. They've tried interpreting multiple signals in their complicated AI way, but have not been able to prevent mediocre content from ranking.
So they're shifting emphasis to across-the-board signals like authorship, reputation (through social networking), and authority, and de-emphasizing the quality-signalling SEO that is so heavily manipulable like links, grammatically correct writing, and keyword relevance. And more than ever, they're looking at user data, the only really objective signal of quality they have.
It's kind of a functional approach - if the page functions like it's useful, then it is.
So how to find the good content, given the problem londrum just pointed out - that their rankings bias what comes to the fore? If mediocre content was ranking before, then it follows that better content was being buried. So what do you do? Tamp down linking and keyword signals, reach to the ends of the web and pull more sites forward, and see what users tell you.
They've been using user data all along, but we're guessing that their confidence in user metrics has increased for some reason. Dunno what.
You know, I really don't think Panda is about crushing content farms. I think it's about becoming better able to hear through the noise in wildly variable websites and network neighborhoods. The website noise becomes more manageable if they turn up the volume on users' signals.
The most powerful set of data Google has is data on its users. They haven't been able to use it very much so far, because it's real noisy. But we're guessing they think they're getting better at it. We won't know for a while if that's true. And if Google loses too much search share in the meantime, we won't ever know.
Or we could be wildly off base. It could be bounce rate. :).
Time will tell. But I do think that if and when Google implemented/implements this patent, it'll mean an end to SEO as we know it, because the whole reason SEO was born was to communicate with a search engine, and the search engine doesn't want to listen to anyone anymore except the users (doesn't trust us for some reason). It would just be nice to get a memo when that happen(s)(ed).
I wonder if google is somehow using these usage stats to regulate the value of outbound links. Seems to me like a great signal to use. So if you're getting links from sites with no traffic, they're not worth anything.
|Seems to me like a great signal to use. So if you're getting links from sites with no traffic, they're not worth anything |
Yes, that seems to be the very best signal to me. Been thinking about that for 6 months now at least. Haven't been able to prove it with testing or anything, but surely they will be using that more and more.
|But I do think that if and when Google implemented/implements this patent, it'll mean an end to SEO as we know it |
Na, just that the rules are changing (as always), and we'll use different tools. Like spending more time pursuing links that send traffic as well as PR/link juice. And making sites stickier. And making the outgoing links more enticing to click, so visitor doesn't hit the "back" button.
Sometimes the most authoritative books /papers are almost never read, relatively speaking.
|I wonder if google is somehow using these usage stats to regulate the value of outbound links. Seems to me like a great signal to use. So if you're getting links from sites with no traffic, they're not worth anything. |
I think we're on phase 16 of guessing :)
Another possibility is that Panda is screwed up, plain and simple, so there's no logic to it when applied across the web. That it was rushed we know, and Goog for obvious reasons would never admit that they failed. Search is their main bread and butter so admitting that they failed is not a small thing.
|Na, just that the rules are changing (as always), and we'll use different tools. Like spending more time pursuing links that send traffic as well as PR/link juice. And making sites stickier. And making the outgoing links more enticing to click, so visitor doesn't hit the "back" button. |
Which is exactly what we would do if there were no search engines. If this is the way Google is going to evaluate the worth of a link then I would welcome it.
How this is achieved I'm not sure but I like the sound of it. Google's long-term aim has to be to remove any element that can be manipulated by webmasters from the ranking equation. This seems like a step in the right direction to me.
maybe not Simsi and it's probably more of an evolution than a step since it's so different.
I know I can get my urls onto page one of the serps (albeit only for my connections) with a simple twitter recommendation now so how is that game-proof?
In fact it means that the FIRST order of business is to get your visitors to join your social groups. Twitter at least, +1 most likely, I don't know what benefit to rankings Facebook gives yet but... you get the idea.
I'm not convinced it is gameproof yet kickaxe. Merely saying that I believe it is a step in that direction as it seems logical to me that Google wants to get to a point (one day) where Webmasters cannot artificially boost their SERPS.
It's why I think many types of links are slowly being devalued, why much META data is now ignored, why KW density is less important than it was, why I think the TITLE and H1 tags will eventually become irrelevant (to ranking anyway) and why I think that user reactions to a page (however that is measured) will become the de facto way of measuring quality content.
But I think they are still some years off fully achieving that. This is just one small step closer IMO.
I'm already convinced that improvements on some sites I manage are down to user behaviour anyway.
|We’re also told that instead of maintaining this kind of user data for individual pages, it might be done on a site-by-site basis, with the site usage information associated with some or all of the pages on that site. |
Very interesting patent. Probably explains why Panda resulted in big sites getting bigger and small sites getting smaller.