I am surprised and amused at some in the community that point to a couple of individual metrics as
the ones that caused
xyz to happen in the most recent update. I think it is time to look at the greater picture of data that Google has available for analysis and interpretation.
Eric Enge had a presentation at PubCon Austin that he felt Panda pumped up to 20% user engagement metrics into the algo. It really got me thinking about the user engagement aspects indepth. In this socialized world, it just makes sense that Google would start using more engagement metrics such as demographic, psycho-graphic, and behavioral metrics. I started to put together a list of possible data sources Google could use as signals, and the list quickly grew large.
Most of the engagement metrics Google can use, will fall into the realm of user behavior. Those data sets can be combined with a successful search result into a powerful metric for your website. I believe that metric is now replacing Page Rank as the number one Google indicator of a quality site. I have been calling this mythical metric, the User Search Success Rate (USSR) or the Panda Metric (PM). This is the rate at which any search results in a happy searcher.
The metric starts before the user ever types in a query at Google:
1:
Referral? How did the user come to Google? Was it from:
- a toolbar (Googles own toolbar, or a branded toolbar from a partner?)
- a partner site (AOL etc),
- a specific browser, Mobile, Desktop, Tablet or something else?
- a link on another site?
- a social association metrics? Did it come from a social site, and do we know who you are already? (Orkut, Twitter, Private Control panel such as wordpress?)
2: Location data- IP address
- GPS Data available? Depends on device.
- Toolbar location data and history.
- WiFi network, Cell phone network or other ISP like location data.
3: Browser request headers- Browser agent, platform and device data
- http accept: gzip, java, flash, etc.
- Screen size
- Toolbar metrics tell all (query string often included agent identifiers)
- Toolbar installation history and other history you may have already shared with Google. (such as version of toolbar)
4: Site Tracking and Advertising Tracking:- What site did you come from and what did you do on that site? (if they were running Google Analytics or other Google Trackable metric)
- Both via Google Analytics and via Google site based advertising like AdSense or analytics (remember, you leak a referral every time you visit a page with Google code on it)
- Coming soon: +1 data from Googles' +1 service.
5: Cookies- My Google or Google Properties, Gmail, Youtube, etc.
- Sites you were logged into while viewing Google advertising from DoubleClick or AdSense. If you click through a login page on Wordpress at "foofoo.com" and then view an adsense at on that site - it is a good signal to track you by
At this point, Google knows who 70-75% (my guess) the users are doing any given query, and can guess accurately at another 15-25% based on browser/software/system profiles (even if your ip changes and you are not logged in, Google can match all the above metrics to a profile on you). That leaves less than 10% of the users in the world, that Google does NOT know. Of that 10%, they can later retro-analyze your profile again when you meet some criteria, such as logging into a Google service such as Gmail. (I'm not saying they care WHO you are specifically, just that you are User xyz that they can track over time)
Finally, after all that data, the user probably types in a query: (if the search didn't come from an off site query box or toolbar to start with).
Query Entry:
- various psycho-graphical data. How the user types in queries into the auto fill box is indicative of users education level, often sex, and other psycho/demographic details.
- spelling, language, syntax, format, etc. All the variables of query can give clues as to who and what the users intent is all about.
SERP Behavior:
- Mouse over preview data. What did the user do?
- Mouse tracking via the js mouse over data. (when you mouse over descriptions, that can be sent back to Google) Intent data.
- Multivariate testing metrics. As we have seen the last year, Google is SERP testing constantly.
Offsite Metrics: Finally the user
clicks on a result and is taken to a site:
- AdSense or DoubleClick serve ads?
- Google Analytics running?
- What dose the user do while he is there? Click path.
- +1 buttons to come.
- How long does he stay on the page
- Does user visit other pages?
- Does user hit the back button and return to Google, or does he wander off to parts unknown?
- Toolbar data. Tracking and site blocking.
Unknown Crowd Sourced Metrics- Google has been known to come up with unusual metrics and crowd sourced data. Take for example Google's Captcha system that is actually used to validate corrupted words from Googles book scanning project. I think it is a good assumption to think there is other data Google could mine that we are oblivious too. If you have played the game werewolf with Matt Cutts - you know he is a crafty one ;-)
After all that, we can quantify a metric (I call, The
Panda Metric). It is an amalgamation of the above data inputs. This set of inputs would be relative to this query. They could also be weighted to relative queries (siblings, brothers/sisters, parents of the query tree from the root query).
How the Panda Metric would actually be applied only leads to more questions:
- would Google want people to leave a successful query and come back to Google.com? Or where would they want the user to go?
- Does a happy Google user keep using Google?
- What should Google do to retarget followup queries?
- Is personalization all it is cracked up to be?
- Does the Panda Metric result in higher Panda Metric scores?
- Is it a self-fulfilling or self-defeating metric that leads into a feedback loop - almost a race condition?
Any way you look at, that data when analyzed and applied within the algo could lead to a higher
happy searcher result. I think the above data is partially what drove the Panda update. Why? Highly successful, high referral, low bounce, quality content, high engagement, and historical pages have seen a solid boost with panda.