|We have a big rankings dataset. How to use it to detect algo changes?|
We've got a lot of data that we're collecting for our clients, and I realized that it might be possible to detect search engine algorithm changes retroactively, and possibly in real time as they're happening.
But we're not really mathematicians, so I'm not sure how we could use the data to make some meaningful conclusions.
We track all the search engine traffic to our customers over a 30 day period.
We have 3 million data points of traffic data containing the keyword and rank in Google (from the referrer)
We have a total of 5 million unique keywords in our database across all our customers
We're tracking 2500 different websites
We have 2 million pages indexed
Most of our clients are integrated with Google Webmaster Tools and Google Analytics, so we could draw data from there as well
... and much more.
We need to protect our client confidentiality, but we'd be happy to search for information in aggregate to detect search engine algorithm changes, or... anything. It didn't occur to us back then, but I'm pretty sure we would have detected Panda within minutes of it sweeping through our clients.
Let me give you an example.
I rank highly for the term "universe", so it gets a few hundred search visitors a day. Roughly 30% of those visitors contain the rank in the referrer URL from Google. So I get a measurement of my rank every few minutes. Obviously it's going to vary because of personalization settings, but it should be possible to pick out larger changes - and something like Panda would have been pretty significant.
Our plan right now is to give our clients some kind of diagnostic tool, so they see how their overall rank is doing from day to day, but I think we could aggregate that across all our clients.
Anyway, if anyone has any ideas, we'd be happy to try and implement them and pull some signal from the noise.
[edited by: tedster at 7:42 pm (utc) on May 23, 2011]
The first thing that strikes me is that you would be able to know when a major algo change occurs, but you would not know WHAT that change is all about. In order to do that you would need to pull a lot of other data into your analysis.
A few key factors would be essential - things like:
- type of website and page (direct ecommerce, affiliate, general information, company B2B information, online app, etc);
- some indicator of the backlink profile that goes beyond number of links;
- where the specific query term fits in a user intention taxonomy, etc.
Even a few bits of related data like this could make a lot of difference in making sense of the ranking shifts you see.
Determining "why" a ranking shift happened is a whole other level of complexity. My first goal is to just detect that a shift occurred at all.
Regarding your suggestions, we don't classify the sites we're tracking, but that would be possible.
We do track their backlink profile, but we don't hold them up to any criteria. We could count links and compare them to the number of domains, but again, I think that would be trying to focus on the "why". I'm not sure I'm prepared to "chase the algo".
We only get the one query, so we don't see it as part of a larger stream.
I thought it might be helpful to SEOs in general to pinpoint the moment that a new algorithm was released into the wild, so they can compare before and after that time to see if it gives them any clues.
I've used a simple tool to detect if a shift occurred - posting levels on WebmasterWorld. Its worked for over 10 years. The more Google changes their algo, the more posts there are asking for help.
Knowing there has been an algo changes is not very useful to me, Google makes over 500 algo tweaks a year. The "why" it changed today and "what" will be needed tomorrow imho is what is really valuable.
|brotherhood of LAN|
A basic measure of volatility may be to use click throughs with the kind of percentages mentioned on this thread [webmasterworld.com]
e.g. these figures
|1 - 42% |
2 - 12%
3 - 8%
4 - 6%
5 - 5%
6 - 4%
7 - 3%
8 - 3%
9 - 3%
10 - 3%
So a move of #1 to #5 may amount to 37% change. Since you only have data for your sites (and assuming you only have one site appearing for most queries)... it could be a basic measure of a rolled-out algo change. Whatever metric you test out, graph it and check the interesting points in time.
I've had this discussion recently though... and it was pointed out to me that relative traffic from SERPs may not necessarily be synonymous with the weight google gives any given page on a query. (rank #10 usually gets more traffic than #9)
Well, here's an example of what we just did. We tracked 2 million search referrers hour-by-hour over the last month. During each hour, we checked whether a keyword went up or down from the last time we checked it. Again... not a mathematician.
Date|Rank Up|Rank Down
23-05-2011 6 PM|23.92%|27.59%
23-05-2011 5 PM|32.51%|27.29%
23-05-2011 4 PM|27.02%|33.12%
23-05-2011 3 PM|25.94%|28.72%
23-05-2011 2 PM|40.15%|27.63%
23-05-2011 1 PM|27.64%|37.92%
23-05-2011 12 PM|41.29%|31.61%
23-05-2011 11 AM|43.19%|26.34%
23-05-2011 10 AM|5.68%|38.43%
23-05-2011 9 AM|34.54%|40.02%
23-05-2011 8 AM|22.25%|26.34%
23-05-2011 7 AM|33.85%|27.68%
23-05-2011 6 AM|36.22%|22.77%
23-05-2011 5 AM|21.98%|24.21%
23-05-2011 4 AM|32.07%|30.4%
23-05-2011 3 AM|24.43%|26.75%
23-05-2011 2 AM|37.9%|26.21%
23-05-2011 1 AM|27.74%|29.34%
23-05-2011 12 AM|28.4%|27.31%
22-05-2011 11 PM|29.89%|24.14%
22-05-2011 10 PM|28.97%|26.32%
22-05-2011 9 PM|20.8%|31.43%
22-05-2011 8 PM|22.92%|27.28%
22-05-2011 7 PM|24.65%|25.4%
22-05-2011 6 PM|28.38%|24.76%
22-05-2011 5 PM|26.56%|30.16%
22-05-2011 4 PM|24.85%|25.46%
22-05-2011 3 PM|38.34%|24.14%
22-05-2011 2 PM|32.24%|28.08%
22-05-2011 1 PM|29.62%|19.26%
22-05-2011 12 PM|19.67%|28.56%
22-05-2011 11 AM|10.02%|28.11%
22-05-2011 10 AM|22.53%|24.33%
22-05-2011 9 AM|14.17%|7.21%
22-05-2011 8 AM|20.37%|13.91%
22-05-2011 7 AM|30.22%|65.28%
22-05-2011 6 AM|13.42%|38.37%
22-05-2011 5 AM|34.01%|20.49%
22-05-2011 4 AM|28.72%|23.7%
22-05-2011 3 AM|23.77%|23.83%
22-05-2011 2 AM|29.97%|21.83%
22-05-2011 1 AM|29.05%|24.5%
If you have the positions for all keywords, a basic measure would be determining the average (absolute) change of the position
Change = (sum of all keyword) |(old position)-(new position)| / (number of keywords)
Of course, this would be just a simple measure, because it wouldn't be taken into account that a change of a higher position indicates a bigger change than a change at lower positions.
You should make some tests with different measures, e.g. using
Change = (sum of all keyword) | 1/(old position) - 1/(new position) | / (number of keywords)
and analyze which works the best.