Cookie preferences
We use cookies for analytics. Privacy Policy You can accept or decline non-essential tracking.
Practical guide to sequential ab testing: formulas, workflow, implementation pitfalls, and a direct execution playbook with A/B Test Calculator.
Go to tool
Statistical significance (Z-test) and confidence intervals.
You set alpha = 0.05 (5% false positive rate) and plan to run a test for 4 weeks. But you check results every day. After 28 checks on data that fluctuates randomly, the probability of *at least one* false significant result is not 5% — it rises to roughly 25-30%.
The reason: each check is a hypothesis test. Even if there is no real effect, random data occasionally looks significant. More checks = more chances for a false alarm. Formally, the error rate inflates because the test statistic follows a random walk under the null, and it crosses any fixed boundary with increasing probability over time.
Sequential testing methods control the overall false positive rate across multiple looks by "spending" alpha gradually. Instead of using alpha = 0.05 at every look, each interim analysis uses a smaller threshold, so the total across all looks stays at 0.05.
Two classic approaches:
O'Brien-Fleming — very conservative early, lenient late. First look might require p < 0.0001 to stop. Final look uses roughly the original alpha. Best when you want to run the full test unless the effect is enormous.
| Look | Alpha spent (cumulative) | Boundary p-value |
|---|---|---|
| 1 of 4 | 0.0001 | 0.0001 |
| 2 of 4 | 0.0054 | 0.0049 |
| 3 of 4 | 0.0221 | 0.0184 |
| 4 of 4 | 0.0500 | 0.0429 |
Pocock — spends alpha evenly. Every look uses approximately the same threshold (~0.016 for 4 looks). Easier to explain but requires more total sample size because you "use up" alpha early.
Baseline: 5% conversion, MDE: 2 pp, alpha: 0.05, power: 80%.
Compute your required sample and schedule using A/B Test Calculator.
Decide on 3-5 interim looks, choose O'Brien-Fleming boundaries, and compute your adjusted sample size in A/B Test Calculator.
This article is reviewed by the Tools Hub editorial team for factual accuracy, practical relevance, and consistency with current product workflows.
Last reviewed:
Practical guide to sample ratio mismatch: formulas, workflow, implementation pitfalls, and a direct execution playbook with A/B Test Calculator.
Practical guide to stop rules ab test: formulas, workflow, implementation pitfalls, and a direct execution playbook with A/B Test Calculator.
Practical guide to bayesian vs frequentist ab testing: formulas, workflow, implementation pitfalls, and a direct execution playbook with A/B Test Calculator.
Practical guide to multivariate vs ab test: formulas, workflow, implementation pitfalls, and a direct execution playbook with A/B Test Calculator.