Glossary
A/B testing
A decision-making method for implementing product changes in which different groups of users are shown old and new options.
A/A testing
A method of checking the accuracy of the chosen approach for A/B testing in which different groups of users are shown the same options. In the case of statistical differences being found, it is concluded that there are problems in the design or chosen experimental methodology.
Frequency statistical approach
An approach in which a point estimate of an unknown parameter is calculated, as well as test statistics having random distributions. Inferences about the winner are made based on the p-value or confidence intervals. Read more
Statistical test
A method of hypothesis testing in frequency statistics, with a certain degree of probability indicating the rejection or non-rejection of the null hypothesis. Statistical tests may differ depending on the experiment parameters. Read more
Statistical significance
A situation in which the hypothesis that there is no statistical difference is rejected by the collected data.
Null hypothesis
A hypothesis put forward before an experiment. After the experiment, we make a decision to reject or not.
Alternative hypothesis
A hypothesis proposed as an alternative to the null one.
Type 1 error (FPR - false positive rate)
The probability of detecting a statistically significant effect in the case that it does not actually exist, e.g., an A/A test showing statistically significant differences.
Type 2 error (FNR - false negative rate)
The probability of not detecting a statistically significant effect in the case that it does actually exist, e.g. an A/B test showing no statistically significant differences when there actually are.
Test power
The ability of the test to detect the effect when it actually exists, calculated by subtracting the FNR value from 100%.
Sample size
The required number of users that need to be collected to identify a statistically significant effect with fixed error probabilities.
p-value (probability-value)
The result of hypothesis testing using the frequency statistical approach. This value is compared with the significance level (FPR). A conclusion is made about statistically significant differences between the test variants: if the p-value is less than the fixed type 1 error, the null hypothesis is rejected.
Confidence interval
An interval that covers the true value of a parameter with a certain level of confidence. It is one of the test results for the frequency statistical approach. It is used interchangeably with the p-value.
Note that it cannot be interpreted as an interval containing a certain fraction of all possible true values. Read more
Multiple testing
A situation in which more than one hypothesis needs to be tested in a single experiment.
Multiple comparisons problem
An error that occurs during multiple testing. Read more
Multiple testing correction
Methods allow multiple choices over the course of an experiment.
Peeking problem
An error that occurs in the premature completion of a classic A/B test, usually when a statistically significant result is achieved. It leads to an increase of FPR and FNR.
Bayesian inference
An approach in which not only the point estimate of a statistical parameter is calculated, but rather its distribution, based on our assumptions about its type (prior expectations) and received information (collected data). Read more
Prior distribution
A type of distribution of values for the studied parameter, which are assumed before conducting a bayesian test. Read more
Posterior distribution
The distribution of all possible values of the metric, recalculated with prior expectations and the data obtained during the experiment being taken into account. Read more
Probability superiority (chance to beat control)
the probability that the selected option is better than the other test options. It can be used as a criterion for completing a Bayesian experiment.
Expected losses
How much we expect to lose on average when choosing a test option. It can be used as a criterion for completing a Bayesian experiment.
Credible interval
The interval that contains a certain fraction of all possible values for the studied parameter. It is one of the test results for the bayesian approach. It can be used as a criterion for completing a Bayesian experiment. Read more
The task of multi-armed bandits in A/B testing
A task, in which it is needed to optimally distribute users among test variables during the experiment in order to maximize total revenue. With multi-armed bandits, the proportions of users by test variants will differ during the experiment compared to classical and Bayesian tests where the breakdown into groups occurs in equal proportions.
Thompson's algorithm
An algorithm based on the Bayesian statistical approach. It allows the revenue to be maximized in the multi-armed bandit task in A/B testing as more traffic is diverted to the leading variant.