Forum Moderators: martinibuster
DISCLAIMER: Some of the following requires an understanding of statistics. But, fear not, I am going to present a very simple set of numbers that anyone can apply to their AdSense data. That's info is labeled with "THE BOTTOM LINE". If you aren't interested in thinking about math, that's all you need to know. For those that are interested in discussing the math involved, feel free to read and respond to the other details. That being said, here we go...
I repeatedly see posts ("My CTR is up today, is this real?", "My CTR is down today, is this real?", "My CTR is so far up/down that I am worried -- should I tell Google?", etc.) that have as a prerequisite for any informative answer a knowledge of whether or not the changes seen are statistically significant. But, I never see any statistics provided. Of course, that could be because everyone fears disclosing them due to the gag order that the AdSense TOS provide for. However, I suspect that it is more because most people that ask such questions are not informed as to how to do basic statistics on their CTR rates. So, I thought I would provide some useful "rule of thumb" figures for everyone to use.
The big concept that everyone needs to understand is that random variations ("variance" or "standard deviation" in math terms) in small samples of data tends to be larger than random variations in large data sets. Why? Because in large data sets the variations tend to cancel themselves out. As a result, the more points you have in a set of data, the more accurately you can detemine its average. That is the key concept here: Yesterday I saw one average, and today I see a higher/lower average, BUT, do I really know that those averages are accurate enough (because I have enough data points) to determine that the different I am seeing is a real difference, and not just the result of random variation in one set of data versus another.
***THE BOTTOM LINE*** The more data points that make up your click-through rate (CTR), the more accurately you can tell one CTR from another CTR. Here are the rule-of-thumb numbers to use:
Impressions / CTR difference that is significant
10 / 4.07%
50 / 1.73%
100 / 1.22%
200 / 0.86%
500 / 0.54%
1000 / 0.38%
5000 / 0.17%
10000 / 0.11%
What does this mean? It means that if you do not have at least the number of impressions in the "Impressions" column, and a difference in the two CTR's that you are comparing equal to or greater than the number in the second column, there is a good chance that you are seeing just a random fluctuation. For example, if you are basing your CTR figures on 500 impressions, and you are concerned that today's CTR is lower than yesterdays' CTR, don't even THINK about being truly concerned unless the difference between the two CTRs is at least 0.54% (such as yesterday's CTR being 2.54% and today's being 2%).
Now, let me stress something else: These are MINIMUMS, which assume that there are no other forces in the unviverse skewing your numbers. We all know that isn't true. Time of day, day of week, Google changing the targeting algorithm, etc, etc, could all affect your numbers and make them truly different even though there isn't anything actually "wrong". So, don't assume that just because your CTR difference is above the threshold number that you can assume there is some problem going on (or that the new banner you just tested really is better or worse than the previous one). In other words, these numbers are not very useful for determing when there truly is a difference DUE TO A GIVEN CAUSE because there are so many complicating factors. But, what they are very useful for is to determine when there is NOT actually a significant difference, which allows you to just stsop thinking about the matter all together.
WARNING: ***MATH FOLLOWS***
For those of you inclined to read this, these figures are done using a T-test at the 99% confidence interval. Is a T-test really the correct choice? Maybe not for small sample sizes because this data really binomial, not interval scale (in other words, someone can click, or not click -- they can't partially click). But, the binomial distribution approaches the normal distribution at sufficiently large sample sizes. So, the numbers presented where N < 200 may not be completely accurate, but they are good enough for these purposes.
Why the 99% confidence level -- do we really need that level of certainty? Probably not, but I know that plenty of people sit around and check their stats many times per day, so to some extent that extra surety represents a Bonferroni correction for multiple comparisons. I should probably really make two charts: One assuming you check your stats once, and one assuming you check your stats 20 times per day. But, that really removes some of the simplicity that I was trying to present so that these rules of thumb are as easy to apply as possible.
Eventually I'll build an AdSense stat cruncher that does all this stuff automatically, but time does not permit at the moment.
Hope this was helpful.
James
That is a very useful post - and I completely agree with your suggestions. Although I did not have the benefit of your deta, I have done a sanity check before jumping to any conclusions...
However, I've noticed - this month for example - on a daily total of say 5000-10000 impressions a CTR that changes from say around 1 to around 4 every day. Ie. for the past week - 2, 1, 2, 1, 4
So I continue to conclude that there is something statistically significant happening and it happening more frequently since Feb 1st.
Thanks again for your analysis!
10 / 204%
50 / 87%
100 / 61%
200 / 43%
500 / 27%
1000 / 19%
5000 / 9%
10000 / 6%
I hope I didn't just muck up what was already a complex issue.
James
Eventually I'll start gathering time of day, day of week, and other statistics to see if I can sort out some of these confounding factors. But, unfortunately several of them will probalby be site-specific due to the different demographics of our users.
Interesting, but it over looks something two very important points. We only see an average over a 24 hour period. The statistics you are showing are for discrete data, not an average figure. Also, if you take the view that Google is a chaotic system then this type of analysis doesn't tell you anything.
What do you mean that it isn't discrete data? Sure it is. Perhaps you are implying that because Google gives you a raw percentage rather than blow-by-blow detail you don't have the information about individual events? But you do, because there are only two outcomes: click, or not click. Therefore, if you know that you had 1,000 impressions and a 2% CTR, you know the raw data: 980 not-clicks, and 20 clicks. The only thing you don't know is what order they occured in, but that's not relevant to this analysis.
And with respect to the input from Google, chaotic or otherwise, that's true. My intent is not for people to use this math to make unequivocal determinations about whether their click-through rates were affected by "X", because there is always "Y" and "Z" that you don't know about. Rather, my intent is to allow people to make their own determination about whether, statistically speaking, two numbers are even different, before they get worried and post things like "My CTR doubled. Should I shut down my ads and contact Google in case there is click fraud going on?" The point being, if your CTR "doubled" from 1% to 2% based on 10 impressions, don't even think about worrying about it.
James
Thinking aloud: your figures mean "even if nothing changed, your CTR can change this much without meaning anything".
The fact is, that there are things that change and that we don't know a lot about (and can't do anything about)!
Also, am I correct in thinking that even with large impressions, low average CTRs make smaller samples, and again variations are less significant?
I mean, CTR going from 0.5 to 1.5 is less significant that CTR going from 2.0 to 6.0.
With respect to the lower number of click-throughs making the data less accurate, although I agree with you that it intuitively seems that it should be that way, it isn't. I ran the numbers for 0.5% CTR, 1% CTR, and 2% CTR, and percentage-wise they were exactly the same. The number of impressions is what is important. Of course, this assumes that you have enough impressions that the binomial distribution or Poisson distribution apporaches the Normal distribution (I think 200 is often the recommended number). Something else (a Chi square perhaps?) should be really used under that number of impressions, but I didn't feel like doing the separate calculations.
One issue: stats lag. It is generally accepted that Google withhold some clicks for more detailed analysis and may dump several days worth of witheld clicks onto your account thus distorting the stats. Bear this in mind. This is less of an issue if you are looking at your stats over a longer time frame.
Page impression -- (CTR change)/(ave. CTR)
10 -- +/- 2.04 (range of change from -2.04 to +2.04)
50 -- +/- 0.87
100 -- +/- 0.61
200 -- +/- 0.43
500 -- +/- 0.27
1000 -- +/- 0.19
5000 -- +/- 0.09
10000 -- +/- 0.06
*Increase, higher (+) and decrease, lower (-)
Surprisingly, the above data can produce a straight line if you plot log(¦Y¦) against log(X). In forms of relationship, we can write as
¦Y¦= 6.318*(X)-0.5047
¦Y¦=absolute value for Y
Or
CTR Rate change = 6.318*(Page Impression)-0.5047
Where Rate change = (Max. CTR change) divided by (average CTR)
Hope someone with different CTR can test this relation and give the feed back.
FromRocky
[edited by: FromRocky at 7:03 pm (utc) on Mar. 11, 2005]
An unusual high total CTR can be caused by more traffic from an ad channel with usual high CTR.
Example made with round numbers to keep it simple
Channel A: 10.000, 2% CTR
Channel B: 2.000, 5% CTR
Total 12.000 2.5% CTR
When now is a change in the traffic pattern
Channel A: 8.000, 2% CTR
Channel B: 4.000, 5% CTR
Total 12.000 3% CTR
Absolut no change in CTR, only the traffic
pattern changed
See the thrid post in this thread -- I already presented that data. There is no need to redo it for other CTRs. I checked 0.5%, 1%, and 2%, and it holds true for all of them. I'm guessing that theory says it is always true, because in a Poisson distribution, the mean and the variance are the same, so the increase in CTR is direclty proportional to the increase in variance.
James
I realize that CTR should be independant of impressions BUT there is undoubtedly variance in the number of impressions. Since CTR is derived FROM the number of impressions, there is a "tolerance stack" that has to be considered when calculating the variance of the CTR. This is the variance in the number of clicks AND the number of impressions.
Basically, you are assuming the "tolerance stack" effect will cancel out. Over the larger number of impressions, it probably will. But for the smaller # impressions, I don't think it will.
Here is a simple example of what I mean.
Take 1000 imps, 20 clicks, and 5% variation.
Variation of CTR only
CTR=20/1000=2%
CTR@95%=1.9%
CTR@105%=2.1%
Variation of Impressions and clicks
imps@95%=950
imps@105%=1050
clicks@95%=19
clicks@105%=21
CTR1=19/1050=1.81%
CTR2=21/950=2.21%
The question is, will using CTR reflect this?
I'm nowhere near an expert or even capable in statistical analysis. I forgot pretty much everything I learned... :( Maybe I am totally out to lunch in this line of thinking. But I think it would be more accurate to do the analysis on clicks vs impressions. THEN calculate the CTR from those results.
I checked 0.5%, 1%, and 2%, and it holds true for all of them.
Yes, but how about CTR of 0.1% or 10%? Right now, the correlation is only applied to the data range from 0.5% to 2%. You need more actual data to validate the theory.
Maybe I am totally out to lunch in this line of thinking.
Yes, you're.
A combination of @95% & @95% is not @95% any more but more closely to 90% and a combination of 2 @105% is about 110%
It's depending on the level of accuracy you design for certain of application. If you want more accuracy, you can use the crystal ball as a simple way to simulate for CTR. The input data would be the stats distributions for both page impressions and number of clicks.
Yes, but how about CTR of 0.1% or 10%? Right now, the correlation is only applied to the data range from 0.5% to 2%. You need more actual data to validate the theory.
I explained that I didn't check more because statistical theory indicated to me that it would always be true. Regardless, I just checked 0.1%, 10%, and 50%, and they all work.