Testing the func­tion­al­ity of various elements using A/B tests is now common practice for most website de­velopers and operators. If suf­fi­cient traffic is available, this test procedure quickly reveals whether scenario A is more suc­cess­ful than scenario B. There are many obstacles that can be en­countered during the planning phase as well as during the test phase and final eval­u­ation. Here are the most common stat­ist­ic­al errors and how you can avoid them:

The biggest mistakes in A/B test planning

Even before you’ve started the test, you might have already set yourself up for failure if you’ve made as­sump­tions and your set-up is based on these. Error 1: foregoing a hy­po­thes­is and playing it by ear Probably the worst mistake that can be made in the pre­par­a­tion stage is to forego a hy­po­thes­is and hope that one of the variants you’re testing will be the right one. Although the number of randomly selected test variants also increases the chance of finding a winner, there’s also the chance that this winner won’t help to improve the web project. With a single variant, you will notice sig­ni­fic­ant op­tim­isa­tion in 5% of cases even though in reality no op­tim­isa­tion has taken place. The more variants that are used, the more likely an alpha error will occur – there’s a 14% chance with 3 different test objects, and 34% with 8 different variants. If you don’t decide on a hy­po­thes­is be­fore­hand, you won’t know what kind of op­tim­isa­tion the winner is re­spons­ible for. If you decide on the hy­po­thes­is that enlarging a button will lead to an increase in con­ver­sions, you can classify the sub­sequent result. In summary, it can be said that A/B testing is by no means de­term­ined by co­in­cid­ence, but rather you should always be hy­po­thes­is-driven and a have limited number of variants. If you also work with tools such as Op­tim­izely, which prevent the error rate from in­creas­ing, nothing will stand in the way of suc­cess­ful testing.

Error 2: de­term­in­ing the incorrect in­dic­at­ors for a test variant’s success

Key Per­form­ance In­dic­at­ors (KPIs), which are crucial to your project, also play an important role in A/B testing and shouldn’t be neglected. While in­creas­ing page views and clicks on blogs or news portals already dictate valuable con­ver­sions, these factors are no more than a positive trend for online shops. Key in­dic­at­ors such as orders, returns, sales, or profits, are sig­ni­fic­antly more important for stores. Because they’re difficult to measure, A/B tests, which count on a main KPI as the absolute profit, take a lot of effort. In turn, they can predict success much more easily than tests that only take into account whether a product has been placed into the shopping cart. This is because the customer might not even end up buying the product in the cart.

It is therefore important to find the ap­pro­pri­ate values. However, you shouldn’t choose too many different ones. Limit yourself to the essential factors and remember the pre­defined hy­po­thes­is. This reduces the risk of presuming there will be a lasting increase even though it’s actually just a co­in­cid­ent­al increase with no lasting effect.

Error 3: cat­egor­ic­ally elim­in­at­ing mul­tivari­ate testing In some cases when preparing A/B tests, you might want to test several elements in the variants. This isn’t really feasible with a simple A/B test, which is why mul­tivari­ate testing is used as an al­tern­at­ive. This concept is often rejected since mul­tivari­ate tests are con­sidered too complex and in­ac­cur­ate even though they could be the optimal solution to the afore­men­tioned problem if used correctly. With the right tools, the various test pages are not only quickly changed, but they are also easy to analyse. With a little practice, you can work out the dif­fer­ence that an in­di­vidu­ally modified component makes, but your web project first needs to have enough traffic. The chance of declaring the wrong winner increases with the number of test variants used – therefore it’s re­com­men­ded to limit your choice to a pre-selection when using this method. In order to be certain that a po­ten­tially better version actually surpasses the original, you can validate the result in ret­ro­spect using an A/B test. However, the prob­ab­il­ity of an alpha error occurring is still 5%.

Statistic problems during the test process

If the test is online and all relevant data has been recorded as desired, it would be fair to believe nothing else stands in the way of suc­cess­ful A/B testing. Im­pa­tience and mis­judg­ments often mean this isn’t the case, so make sure you avoid these typical errors. Error 4: stopping the test process too pre­ma­turely Being able to read detailed stat­ist­ics during the test proves very useful, but it often leads to premature con­clu­sions with users even ter­min­at­ing the tests too soon in extreme cases. In principle, each test requires a minimum test size since the results usually vary greatly at the beginning. In addition, the longer the test phase persists, the higher the validity since random values are noticed and can then be excluded. If you stop the test too early, you run the risk of getting a com­pletely wrong picture of how the variant is per­form­ing and then clas­si­fy­ing it as far better or worse than it really is. Since it’s not so easy to determine the optimal test time, there are various tools such as the A/B test duration cal­cu­lat­or from VWO, which you can use to help you with the cal­cu­la­tion. There are, of course, very good reasons for ending a test pre­ma­turely, for example, when a variant is per­form­ing badly and could jeop­ard­ise your economic interests.

Error 5: using modern test processes in order to shorten the test length It is no secret that various A/B tests work with methods to help keep the error rate as low as possible among the variants used. The Bayesian method, which is used by Op­tim­izely and Visual Website Optimizer, promises test results even if the minimum test size hasn’t yet been reached. If you use results from an early stage for your eval­u­ation, you could encounter statistic problems. On the one hand, this method is based on your estimates regarding a variant’s success, and on the other hand, the Bayesian method cannot identify initial values as such.

Common errors when analysing A/B test results

It’s chal­len­ging finding suitable KPIs, for­mu­lat­ing hy­po­theses, and ul­ti­mately or­gan­ising and carrying out the A/B test. However, the real challenge awaits you when it comes to analysing the collected values and using them to make your web project more suc­cess­ful. This is the part where even pro­fes­sion­als can make mistakes, but at least make sure you avoid any of the mistakes that are easy to avoid, such as these:

Error 6: only relying on the results of the testing tool

The testing tool doesn’t just help you to start the test and help you visualise the data collected, but it also provides detailed in­form­a­tion about whether the variant has made an im­prove­ment and how much it would affect the con­ver­sion rate. In addition, a variant is declared as the winner. These tools cannot measure KPIs such as the absolute sales or returns, therefore you have to in­cor­por­ate the cor­res­pond­ing external data. If the results don’t meet your ex­pect­a­tions, it might be worth taking a look at the separate results of your web analysis program, which usually provides a much more detailed overview of users’ behaviour.

In­spect­ing in­di­vidu­al data is the only way to identify rogue values and filter them out of the overall result. The following example il­lus­trates why this can be very decisive criteria for avoiding a wrong as­sump­tion: the tool has shown that variant A is the optimal version since it achieved the best results. However, closer ex­am­in­a­tion reveals that this is down to a single user’s purchase, who happens to be a B2B customer. If you remove this purchase from the stat­ist­ics, variant B suddenly shows the best result.

The same example can be applied to the shopping cart, the order rate, or various other KPIs. In each of these cases, you will notice that extreme values can strongly influence the average value and that false con­clu­sions can quickly arise from this.

Error 7: seg­ment­ing the results too much

The detailed veri­fic­a­tion of the A/B testing data in com­bin­a­tion with external data sources opens up a lot more options. It’s par­tic­u­larly common to assign results to in­di­vidu­ally defined user groups. This is how you can find out how users of a par­tic­u­lar age group, a par­tic­u­lar region, or a par­tic­u­lar browser have responded to the par­tic­u­lar variant. The problem is that the more segments you compare, the higher the chance of error.

For this reason, you should make sure that the chosen groups have a high relevance for your test concept and make up a rep­res­ent­at­ive part of the overall users. For example, if you’re just examining a group of males under 30 years old, who access your site via tablet, and who only visit on weekends, you’re covering a test size that doesn’t represent the entire audience. If you plan to segment the results of an A/B test in advance, you should also set a cor­res­pond­ingly long test period.

Error 8: ques­tion­ing the success due to vague cal­cu­la­tions

To il­lus­trate the extent to which changing to a new variant will affect the future con­ver­sion rate, A/B tests results are often used as the basis for concrete cal­cu­la­tions. This may be an effective means for present­a­tion purposes, but future prognoses aren’t really practical due to the different in­flu­ences involved. While the results of A/B tests only provide in­form­a­tion about short-term changes in user behaviour, long-term effects such as the impact on customer sat­is­fac­tion are not meas­ur­able within the short test period – assuming that the con­sist­ency of a de­term­ined growth is premature. In addition, there are in­flu­ences such as seasonal fluc­tu­ations, supply shortages, changes in the product range, changes in the customer base, or technical problems that can’t be included in A/B testing.

It’s important to keep a cool head regarding statistic problems and wrong as­sump­tions when carrying out and analysing a website’s usability test. Making con­clu­sions too early could lead to you being dis­ap­poin­ted with the sub­sequent live results even though the optimised version of your project actually works quite well. Only when you formulate a future prognosis as well as a clean and well thought out working method when carrying out the analysis, you will be able to evaluate and interpret the A/B test results properly.

Go to Main Menu