How Split Tests might be Doing You More Harm than Good
Split tests Full of Air
Problem: My split test winner isn’t outperforming the other control group! Why?
You just ran a split test on the subject line of your last email and one group clearly performed better than the other, so you run with the better performing subject line with the expectation that you made the right decision.
You take a look at the results the next day and to your unpleasant surprise you find out that the rest of the target group performed much worse than the test group! Re-ject! How is that possible you think to yourself? It might be that the result of your split test wasn’t statistically significant.
In Layman’s terms we can say that something is statistically significant if the test result didn’t occur by chance (whatever you did, did actually make a difference) and it probably wasn’t just luck (if we were to run that same test again, we should get the same or similar results more often than not).
Why Statistical Significance?
We decided to combat one of the inherent weaknesses of split testing: split test results can lead you into making erroneous decisions. Having the appropriate confidence levels (both statistically and in our own assumptions) allows us to make better informed business decisions.
Example: Pump Up and Air Out?
In our last article about cross selling we sold you a pair of socks to go with your Air Jordan’s (we’ve got quite a few basketball fans here at ExpertSender). However, unbeknown to you, before we sold you those Air Jordan’s we ran a split test to see whether you might be more inclined to buy a pair of Air Jordan’s or a pair of Reebok Pumps.
We ran with the Air Jordan’s because they performed better in our split test (our test group was in fact statistically significant) and you can see the results below;
The Statistical significance result confirms that our second test group was a certain winner and based on this result we decided to send you through an email campaign offer for a pair of Air Jordan’s.
ExpertSender uses a 95% confidence level and this means that in 95 out of 100 samples your estimated mean will fall within your confidence interval.
There might be a situation when the winning version is not certain (this actually happens quite often), so what do you do then? Which group do you use? Were your test groups large enough? Were you testing the right parameters? Below are some of the other possible suggestions that a user might receive;
- Winning version is certain, but either the size of the test group was too big or the number of versions too small. You can try using smaller test group or adding another version in the next split test.
- Winning version is not certain. Try using a larger test group or remove one test version in the next split test.
- Winning version is certain, but the result of the remaining versions is not certain. Try testing another variable next time.
Are you shooting bricks with your split tests?
Maybe it’s time to practice your field goals with statistical significance analysis and have your future decisions backed up by mathematics?