STATISTICSInference & TestsStatistics Calculator
📊

Z-test Calculator

Free z-test calculator with step-by-step breakdown. One-sample, two-sample, one-proportion, two-prop

Run CalculatorExplore data analysis and statistical calculations

Why This Statistical Analysis Matters

Why: Statistical calculator for analysis.

How: Enter inputs and compute results.

z
STATISTICSInference & Tests

Z-Test — Hypothesis Testing with Known σ

One-sample, two-sample, proportion tests. p-value, confidence interval, effect size, power. Step-by-step breakdown with interactive visualization.

Real-World Scenarios — Click to Load

Test Configuration

One-Sample Data (σ known)

z_test_results.sh
CALCULATED
$ z_test --type="one-sample-mean" --tail="two-tailed" --alpha=0.05
Decision
FAIL TO REJECT
z-statistic
0.8000
p-value
0.7103
95% CI
[97.100, 106.900]
Standard Error
2.5000
Cohen's d
0.1333
Negligible
Power
19.8%
Significance
α = 0.05
two-tailed
Share:
Z-Test Result
One-sample z-test
z = 0.800
p = 0.7103Not significantPower: 20%
numbervibe.com/calculators/statistics/z-test-calculator

95% Confidence Interval

97.1001Estimate: 102.0000106.8999

Red line = null value (100). CI includes null → not significant.

Standard Normal Distribution: Rejection Region & p-value

Power vs Sample Size

Effect Size vs Cohen's Benchmarks

Calculation Breakdown

COMPUTATION
Standard Error
2.5000
SE = σ/√n = 15/√36
z-statistic
0.8000
z = (x̄ − μ₀)/SE = (102 − 100)/2.5000
DECISION
Critical value z*
±1.9600
α=0.05, two-tailed
p-value
0.7103
2(1 − Φ(|z|))
DECISION
FAIL TO REJECT H₀
CONFIDENCE INTERVAL
95% CI
[97.1001, 106.8999]
EFFECT SIZE & POWER
Cohen's d
0.1333 (Negligible)
Statistical Power
19.8%
P( ext{reject} H_{0} | H_{1} ext{true})

⚠️For educational and informational purposes only. Verify with a qualified professional.

Key Takeaways

  • • The z-test is used when the population standard deviation σ is known or the sample size is very large (n > 30)
  • • The p-value is the probability of observing your test statistic (or more extreme) if H₀ were true
  • • Reject H₀ when p-value < α. The confidence interval provides the range of plausible parameter values
  • • Effect size (Cohen's d for means, h for proportions) measures practical significance, not just statistical
  • • Power = P(reject H₀ | H₁ true). Target 80% power when designing studies — use the power curve to plan sample size
  • • For proportion tests, the normal approximation requires np ≥ 5 and n(1−p) ≥ 5 to be valid

Did You Know?

📊The z-test was formalized by Jerzy Neyman and Egon Pearson in the 1930s as part of the Neyman-Pearson framework for hypothesis testing.Source: Stanford Encyclopedia of Philosophy
🔬In 2019, over 800 scientists signed the ASA statement urging researchers to move "beyond p < 0.05" and report effect sizes and confidence intervals alongside p-values.Source: Nature, 2019
📈Cohen's benchmarks (d = 0.2, 0.5, 0.8) were originally proposed in 1988 — but Cohen himself warned they are context-dependent and should not be applied blindly.Source: Cohen, 1988
🏥The FDA typically requires p < 0.05 in TWO independent clinical trials before approving a drug — effectively requiring p < 0.0025 combined evidence.Source: FDA Guidance Documents
⚖️A two-tailed z-test with α = 0.05 and n = 30 has only 17% power to detect a "small" effect (d = 0.2). You need n ≈ 400 per group for 80% power.Source: GPower 3.1
🎯The z-test and t-test give identical results as n → ∞. For n > 120, the difference between z* and t* critical values is less than 1%.Source: NIST Handbook

Expert Tips

z vs t: The Decision Rule

Use z when σ is known from specifications, historical data, or standardized tests (IQ, SAT). Use t when σ is estimated from your sample. For n > 120, the difference is negligible.

One-Tailed vs Two-Tailed

Only use one-tailed tests when the direction was specified BEFORE seeing data. Post-hoc switching from two-tailed to one-tailed inflates Type I error and is considered p-hacking.

Interpreting Non-Significance

"Fail to reject H₀" does NOT mean H₀ is true. Check your power — if power is low, you may simply lack the sample size to detect the effect. The CI width reveals precision.

Multiple Testing Correction

Running multiple z-tests inflates the familywise error rate. For k tests, the Bonferroni correction uses α/k. See our Bonferroni Correction Calculator.

When to Use Each Test Type

ScenarioTest TypeExample
Mean vs known value, σ knownOne-sample zIQ scores vs μ=100 (σ=15)
Two means, both σ knownTwo-sample zDrug vs placebo BP (large trial)
Proportion vs targetOne-proportion zElection poll: >50% support?
Two proportionsTwo-proportion zA/B test conversion rates
Mean vs known value, σ unknownUse t-test insteadSmall sample customer satisfaction
Non-normal data, small nUse Mann-Whitney UOrdinal or skewed data

Why Use This Calculator vs. Other Tools?

FeatureThis CalculatorR / PythonExcel
One-sample, two-sample, proportion z-tests⚠️ Manual
Interactive normal curve visualization⚠️ Code needed
Power analysis & sample size curve✅ (pwr)
Effect size with benchmarks⚠️ Manual
Step-by-step calculation breakdown
Copy & share results
AI-powered interpretation
No installation / no coding required

Frequently Asked Questions

When should I use a z-test instead of a t-test?

Use the z-test when the population standard deviation σ is known (e.g., from standardized tests, manufacturing specs, or historical data). Use the t-test when σ is unknown and estimated from your sample. For very large samples (n > 120), the results are virtually identical.

What does the p-value actually mean?

The p-value is the probability of observing a test statistic as extreme as (or more extreme than) yours, assuming H₀ is true. A small p-value (< α) means the data is unlikely under H₀, providing evidence against it. It does NOT measure the probability that H₀ is true.

What is the difference between statistical and practical significance?

Statistical significance (p < α) means the effect is unlikely due to chance. Practical significance means the effect is large enough to matter in the real world. A huge sample can make a tiny, meaningless effect statistically significant. Always report effect sizes alongside p-values.

How do I interpret the confidence interval?

A 95% CI means: if we repeated this study many times, 95% of the intervals would contain the true parameter. If the CI for a difference excludes zero (or the null value), the test is significant at that α level. Wider CIs indicate less precision.

What is statistical power and why does it matter?

Power is the probability of correctly rejecting H₀ when H₁ is true. Low power (< 80%) means you might miss real effects (Type II error). Use the power curve to plan your sample size before collecting data.

Can I use the z-test for small samples?

The z-test requires that the sampling distribution is approximately normal. For means, this holds when the population is normal or n is large (CLT, typically n ≥ 30). For proportions, you need np ≥ 5 and n(1−p) ≥ 5. For small samples with unknown σ, use the t-test.

What are Type I and Type II errors?

Type I error (α): rejecting H₀ when it's actually true (false positive). Type II error (β): failing to reject H₀ when H₁ is true (false negative). Power = 1 − β. You control α by choosing your significance level; you control β by increasing sample size.

Should I use one-tailed or two-tailed?

Use two-tailed unless you have a strong, pre-specified directional hypothesis. One-tailed tests have more power for that direction but cannot detect effects in the opposite direction. Switching from two-tailed to one-tailed after seeing data is considered p-hacking.

Z-Test by the Numbers

1.96
z* for α=0.05, two-tailed
2.576
z* for α=0.01, two-tailed
80%
Recommended minimum power
0.05
Standard significance level

Disclaimer: This calculator is for educational and research planning purposes. It uses the Abramowitz & Stegun normal CDF approximation (accuracy ≈ 7.5×10⁻⁸). For publishable research, verify results with established statistical software (R, Python scipy, SAS, SPSS). Always check assumptions: known σ, independence, normality (or large n via CLT), and adequate np for proportion tests. Not professional statistical consulting advice.

👈 START HERE
⬅️Jump in and explore the concept!
AI