STATISTICSInference & TestsStatistics Calculator
📊

AB Test Calculator

Free A/B test calculator. Two-proportion z-test, sample size, power, confidence intervals, Bayesian

Run CalculatorExplore data analysis and statistical calculations

Why This Statistical Analysis Matters

Why: Statistical calculator for analysis.

How: Enter inputs and compute results.

A/B
STATISTICSInference & Tests

A/B Test — Statistical Significance & Power

Two-proportion z-test, sample size, power, CI, Bayesian P(B>A). Conversion rate comparison with step-by-step breakdown.

Real-World Scenarios — Click to Load

View Mode

Control (A)

Variant (B)

ab_test_results.sh
CALCULATED
$ ab_test --control=1000,50 --variant=1000,65 --alpha=0.05
Decision
NOT SIGNIFICANT
z-statistic
1.4408
p-value
0.2349
Relative Lift
30.0%
Absolute Diff
0.0150
95% CI
[-0.0054, 0.0354]
Sample Size Needed
3778/group
Power
53.0%
Bayesian P(B>A)
88.3%
Share:
A/B Test Result
Conversion Rate Comparison
✗ Not Significant
z = 1.441p = 0.2349Lift: 30.0%P(B>A): 88%
numbervibe.com/calculators/statistics/ab-test-calculator

Conversion Rate Comparison

Power Curve vs Sample Size

Power vs sample size for p₁=0.050, p₂=0.065. 80% threshold shown.

95% Confidence Interval for Difference

-0.0054Estimate: 0.01500.0354

Red line = 0. CI includes 0 → not significant. CI excludes 0 → significant.

Calculation Breakdown

COMPUTATION
p̂_A (Control)
0.0500
x_A/n_A = 50/1000
p̂_B (Variant)
0.0650
x_B/n_B = 65/1000
Pooled proportion
0.0575
(x_A+x_B)/(n_A+n_B) = 115/2000
Standard Error
0.0104
√(p̂(1-p̂)(1/n_A+1/n_B))
z-statistic
1.4408
(p̂_B − p̂_A)/SE = (0.0650 − 0.0500)/SE
p-value
0.2349
2(1 - Φ(|z|)) ext{for} ext{two}- ext{sided}
DECISION
NOT SIGNIFICANT — Fail to reject H₀
EFFECT SIZE
Relative Lift
30.0%
(p̂_B - p̂_A)/p̂_A imes 100
Absolute Difference
0.0150
p̂_B - p̂_A
CONFIDENCE INTERVAL
95% CI for difference
[-0.0054, 0.0354]
Bayesian P(B > A)
88.3%
ext{Normal} approximation ext{to} ext{Beta} ext{posteriors}

For educational and informational purposes only. Verify with a qualified professional.

Key Takeaways

  • Two-proportion z-test: p̂_A = x_A/n_A, p̂_B = x_B/n_B. Pooled p̂ for SE. z = (p̂_B − p̂_A) / SE.
  • p-value: 2×(1−Φ(|z|)) for two-sided; 1−Φ(z) for one-sided. Reject H₀ if p < α.
  • Relative lift: (p̂_B − p̂_A) / p̂_A × 100%. Absolute difference: p̂_B − p̂_A.
  • 95% CI: (p̂_B − p̂_A) ± 1.96 × SE. If 0 is outside CI, difference is significant.
  • Sample size: n = (z_α/2 + z_β)² × (p₁(1−p₁) + p₂(1−p₂)) / (p₂−p₁)² per group.
  • Power: Probability of detecting a true effect. Aim for 80%+.
  • Bayesian: P(B > A) from Beta posteriors. Interpret as probability variant beats control.

Did You Know?

📊Most A/B tests are underpowered. With 80% power, you need ~1000+ per group for 5%→6.5% lift at α=0.05.Source: Evan Miller
🔄Sequential testing (O'Brien-Fleming) allows early stopping but requires adjusted boundaries.Source: Optimizely Stats Engine
🎯Bayesian A/B tests give P(B beats A) directly — easier to interpret than p-values for stakeholders.Source: VWO Best Practices
⚠️Peeking at results without adjustment inflates Type I error. Use sequential methods or wait for planned sample size.Source: Kohavi et al., 2009
📈Relative lift can be misleading when baseline is low. 1%→2% is 100% relative lift but small absolute gain.Source: Google Analytics
🔬Minimum detectable effect (MDE) = smallest lift you can detect with given power and sample size.Source: Evan Miller

Expert Tips

Plan Sample Size First

Use Sample Size mode before running a test. Aim for 80% power. Stopping early with low power means you may miss real effects.

One vs Two-Sided

Use two-sided unless you would never act on B being worse than A. One-sided gives more power for that direction but cannot detect harm.

Statistical vs Practical Significance

A result can be statistically significant but practically trivial. 5.00% vs 5.01% with 1M visitors — significant, but 0.01% lift may not justify the change.

Multiple Variants

Testing A vs B vs C inflates false positives. Use Bonferroni correction (α/k) for k comparisons.

This Calculator vs Google Optimize vs Manual

FeatureThis CalculatorGoogle OptimizeManual (Excel/R)
Two-proportion z-test⚠️ Manual
Sample size estimation⚠️ Manual
Power analysis
Bayesian P(B>A)⚠️ Code needed
Step-by-step breakdown
CI visualization
Copy & share results
No platform lock-in

Frequently Asked Questions

What sample size do I need?

Use the Sample Size mode. Enter expected control rate (p1), expected variant rate (p2), α (usually 0.05), and desired power (usually 0.8).

What does P(B > A) mean?

Bayesian probability that the variant conversion rate exceeds the control. 95% means strong evidence variant is better.

When is a result significant?

When p-value < α (e.g., 0.05). Also when the 95% CI for the difference excludes 0.

How do I interpret relative lift?

Relative lift = (pB−pA)/pA × 100%. E.g., 30% lift means variant converts 30% more often relative to control.

What is statistical power?

Probability of correctly rejecting H₀ when there is a true effect. 80% power means 80% chance to detect the specified lift.

Can I use this for click-through rate?

Yes. Enter impressions as n and clicks as x. The two-proportion z-test works for any binary outcome.

When should I use Fisher exact test instead?

For small samples (np &lt; 5 or n(1−p) &lt; 5), the normal approximation is poor. Use Fisher exact test for small counts.

How long should I run an A/B test?

Run for at least 1–2 full business cycles (e.g., week) to capture day-of-week effects. Ensure equal traffic split.

A/B Testing by the Numbers

~3000
Per group for 5%→6.5% at 80% power
1.96
z* for 95% CI (two-sided)
80%
Recommended minimum power
0.05
Standard significance level

Disclaimer: This calculator provides statistical guidance. Business decisions should consider effect size, cost, and risk — not just p-values. Z-test assumes large samples (np and n(1−p) ≥ 5). For small samples, use Fisher exact test.

👈 START HERE
⬅️Jump in and explore the concept!
AI

Related Calculators