This is the point where most people ask... how many items do I need in each sample to be meaningful?
Not so simple. It depends on the behavior of what you're measuring and how different the two sets behavior is.
An example:
~~~~~~
A = 1000 clicks, 3 conversions
B = 1100 clicks, 2 conversions
~~~~~~
Inconclusive, not enough conversions, A winning, but only 81% probability. Aim for >95% probability in your tests.
Same ratios of success (A to B), but each one's success rate is a larger set of data:
~~~~~~
100 clicks, 60 conversions
110 clicks, 40 conversions
~~~~~~
A is clear winner, 99%+ probability.
Here's a decent calculator:
[
tools.seobook.com...]