How many impressions to draw conclusion?

Forum Moderators: martinibuster

Message Too Old, No Replies

How many impressions to draw conclusion?

...in split test

adamas

2:12 pm on Nov 30, 2006 (gmt 0)

Currently running a split test on my adsense ads comparing one factor over four different values.

How many impressions (per split) would you want before you were willing to consider the CTRs as a reliable indication?

ronburk

6:45 pm on Nov 30, 2006 (gmt 0)

I would first look at the historical data for the page being tested with a moving average, and establish how big a window the moving average needs to get a smooth curve. If, for example, that showed that I needed about a 10-day window to smooth out random variations, then I would test each arm separately for 10 days.

adamas

9:19 am on Dec 1, 2006 (gmt 0)

The moving average sounds like a good idea Ron. Thanks.

I'm not sure I'm reading your second piece of advice correctly. Are you saying you would run one split for ten days (in this example) and then run another split for ten days rather than running each half of the time for twenty days?

I have some strong seasonal variations and I thought that running the splits over the same time period would help to avoid this being another factor to account for in the results. You know, keep everything apart from what you're testing the same as far as possible.

adamas

2:57 pm on Dec 1, 2006 (gmt 0)

Might be asking the wrong question. After much searching I'm now leaning towards a chi-squared test which, for analysing a split test, would be more based on clicks than impressions.

If anybody who knows what they're talking about with statistics (I like maths in general, but hate stats!) could tell me if I'm warm or not...

... and if you want to search for beginners information on it - I had more luck searching for the relevant function name in my spreadsheet than on more general terms.

ronburk

2:01 am on Dec 2, 2006 (gmt 0)

Are you saying you would run one split for ten days (in this example) and then run another split for ten days rather than running each half of the time for twenty days?

It's kinduva problem either way, IMO. Do them sequential and, as you note, you can skew because of seasonal variations. Interleave them, and (depending on the nature of the traffic flow and # of pages being tested) you can get interaction effects -- for example, the user is attracted to some feature that changed from the previous page view.

I used to interleave my AdWords testing of ad text. But now I tend to do sequential testing instead. The problem was, with interleaved testing, I would get into some situations where just about anything new would test better than just about anything old. I was really just testing for ad blindness rather than the ability of the copy to sell, IMO.

Of course, if you can easily arrange to do your A/B testing so that no individual visitor ever sees both A and B sides of the test, then I see no problem.

This is a bad time of year to be testing anything, in the U.S., at least, with lots of vacations, and the vacations tend to be skewed towards white-collar workers, and kids are in school one week and out the next.

rbacal

2:35 am on Dec 2, 2006 (gmt 0)

Might be asking the wrong question. After much searching I'm now leaning towards a chi-squared test which, for analysing a split test, would be more based on clicks than impressions.
If anybody who knows what they're talking about with statistics (I like maths in general, but hate stats!) could tell me if I'm warm or not...

I'm a former research statistician in social sciences research. I'm a bit rusty but if you can formulate a more specific question, I can take a crack at it (or tell you I don't know).

I'm not sure if you're asking about chi-square, or using clicks, or what?

adamas

12:05 pm on Dec 4, 2006 (gmt 0)

Ron: I see your point. It is already on my to do list to get my split testings split by session (and previous session) rather than page request. It's going to take a while to code but clearly needs doing.

rbacal:

What I am doing is trying different formats and locations (only a few at a time, don't have that much traffic) and tracking impressions and clicks on each via channels.

I'm translating 'clicks' directly into 'observed clicks'. For expected clicks I am assuming (for the null hypothesis?) a proportionate spread of clicks depending on impressions i.e. total clicks / total impressions * impressions for that channel.

At that point I plug the observed and expected values directly into the chitest function in OpenOffice Spreadsheet or Excel - which shows the level of my knowledge on this :)

What I am aiming to get out of this is whether the data I have so far on split tests is sufficient to have a reasonable confidence in the results indicated. Is this data appropriate for chitest?

rbacal

5:02 pm on Dec 4, 2006 (gmt 0)

I'm translating 'clicks' directly into 'observed clicks'. For expected clicks I am assuming (for the null hypothesis?) a proportionate spread of clicks depending on impressions i.e. total clicks / total impressions * impressions for that channel.
At that point I plug the observed and expected values directly into the chitest function in OpenOffice Spreadsheet or Excel - which shows the level of my knowledge on this :)
What I am aiming to get out of this is whether the data I have so far on split tests is sufficient to have a reasonable confidence in the results indicated. Is this data appropriate for chitest?

I'm still unclear about what numbers you are trying to plug into the chi square. So, maybe this will help. Chisquare is used for category data (high, low, increase, decrease, etc) and not raw data. The expected values are actually calculated from your data. So, let's say you have just two ads and you want to know if there is a difference in the number of clicks (I'm simplifying here. Obviously number of clicks depends on impressions, but let's ignore that for now.

So, you split test on a daily basis alternating (again, I'm simplifying).

Picture a box with four cells in it. There are two rows, two columns. The rows represent the two ads, while the two columns are labelled clicked/didn't click. Over a month this is what you might get

For ad 1 (row 1) 20 80
For ad 2 (row 2) 40 60

Your expected values are calculated from this data so your EO's for each cell would be:

For row1 30 70
For row2 30 70

(sum the columns and divide by two)

If there is no sig. difference between the two ads, the actual values will be very close to the expected values.

If you have 10 different ads you want to compare, the chi square will tell you whether ads in general make a difference, but NOT which ads are significantly better. The chisquare value is calculated on the differences between expected and observed over ALL your cells. You could compare pair-wise (two ads at a time), but that's a problem (see comment on t-test below).

To your questions. When you get a probability that tells you if it's likely that your results are not likely a result of chance, the calculation takes into account the number of observations. So, the fewer observations, the larger the differences will need to be for them to be statistically significant. So, that's actually not a major concern, because the test itself takes into account sample size.

Is chisquare appropriate? Chisquare can be used for almost any data (it has few requirements), but it's not very powerful in determining things when the differences are small, AND, it's intended for category data, and I'm not sure what categories you are using or trying to use.

If you have multiple ad formats, channels, and you want to see if they perform differently, probably the way to go is an analysis of variance, but that's way more complex and would require a stat. package probably. It can tell you a whole lot more, for example if a blue ad at the bottom is better than a blue ad at the top.

The middle way of doing it, particularly if you want to compare two ads to see if one is better than another is probably to use a t-test, but the problem with multiple t-tests is the more you do, the more likely you'll get a few significant ones by chance.

Like I said, I'm rusty, and open to correction, and I can't quite figure out how you plan on applying the chisquare to your data. I'm not sure any of this will help you, sorry if it doesn't.