Cookie preferences
We use cookies for analytics. Privacy Policy You can accept or decline non-essential tracking.
Practical guide to false positive ab test: formulas, workflow, implementation pitfalls, and a direct execution playbook with A/B Test Calculator.
Go to tool
Statistical significance (Z-test) and confidence intervals.
A false positive (Type I error, alpha error) means your test declares a winner when there is no real difference. You ship variant B thinking it lifts conversion by 3%, but in reality B is identical to A. The "lift" was noise.
At alpha = 0.05, you accept a 5% chance of this per test. That sounds safe. It is not safe at scale.
If you run 20 independent tests at alpha = 0.05, the probability of *at least one* false positive is:
P(at least 1 false positive) = 1 - (1 - 0.05)^20 = 1 - 0.95^20 = 0.64
That is a 64% chance. With 20 tests, you are virtually guaranteed at least one false winner. If each false positive ships a change that actually hurts conversion, you accumulate damage over time.
The simplest fix: divide alpha by the number of tests.
Running 5 metrics in one test? Use alpha = 0.05 / 5 = 0.01 per metric.
| Number of comparisons | Bonferroni alpha | Required p-value |
|---|---|---|
| 1 | 0.050 | < 0.050 |
| 3 | 0.017 | < 0.017 |
| 5 | 0.010 | < 0.010 |
| 10 | 0.005 | < 0.005 |
| 20 | 0.0025 | < 0.0025 |
Bonferroni is conservative — it reduces power. A less conservative alternative is Benjamini-Hochberg (controls false discovery rate instead of family-wise error rate). But Bonferroni is simple and never wrong.
Pre-registration means documenting before the test:
Why this works: it eliminates post-hoc rationalization. Without pre-registration, teams unconsciously test 10 metrics, find one significant result, and present it as "the" finding. Pre-registration forces honesty.
Guardrail metrics are secondary metrics you monitor to catch regressions, not to find wins:
Set guardrail thresholds in advance: "If bounce rate increases by >2 pp, do not ship regardless of primary metric result." Evaluate guardrails in A/B Test Calculator.
Before your next test, write down your primary metric, alpha level, and sample size. Then compute the required duration in A/B Test Calculator and commit to it.
This article is reviewed by the Tools Hub editorial team for factual accuracy, practical relevance, and consistency with current product workflows.
Last reviewed:
Practical guide to stop rules ab test: formulas, workflow, implementation pitfalls, and a direct execution playbook with A/B Test Calculator.
Practical guide to sample ratio mismatch: formulas, workflow, implementation pitfalls, and a direct execution playbook with A/B Test Calculator.
Practical guide to bayesian vs frequentist ab testing: formulas, workflow, implementation pitfalls, and a direct execution playbook with A/B Test Calculator.
Practical guide to sequential ab testing: formulas, workflow, implementation pitfalls, and a direct execution playbook with A/B Test Calculator.