Pearson Correlation — Linear Association
Compute r, R², t-test, 95% CI (Fisher z). Scatter plot, residuals, step-by-step.
Why This Statistical Analysis Matters
Why: Pearson r quantifies linear association. Essential for regression, hypothesis testing.
How: Enter (x,y) pairs. Get r, R², t-test, Fisher z CI, scatter plot.
- ●r ∈ [-1,1]
- ●R² = r²
- ●Fisher z for CI
Pearson Correlation — Linear Association
Compute r, R², t-test, 95% CI (Fisher z). Scatter plot with regression line, residuals, step-by-step breakdown.
Paste from Clipboard
Real-World Scenarios — Click to Load
| x | y | |
|---|---|---|
Scatter Plot with Regression Line
Residuals Plot
Calculation Breakdown
Step-by-Step Computation Table
| i | xᵢ | yᵢ | xᵢ−x̄ | yᵢ−ȳ | (xᵢ−x̄)(yᵢ−ȳ) | (xᵢ−x̄)² | (yᵢ−ȳ)² |
|---|---|---|---|---|---|---|---|
| 1 | 1.00 | 2.00 | -2.0000 | -3.4000 | 6.8000 | 4.0000 | 11.5600 |
| 2 | 2.00 | 4.00 | -1.0000 | -1.4000 | 1.4000 | 1.0000 | 1.9600 |
| 3 | 3.00 | 5.00 | 0.0000 | -0.4000 | 0.0000 | 0.0000 | 0.1600 |
| 4 | 4.00 | 7.00 | 1.0000 | 1.6000 | 1.6000 | 1.0000 | 2.5600 |
| 5 | 5.00 | 9.00 | 2.0000 | 3.6000 | 7.2000 | 4.0000 | 12.9600 |
r = Σ(xᵢ−x̄)(yᵢ−ȳ) / √(Σ(xᵢ−x̄)² × Σ(yᵢ−ȳ)²) = 17.0000 / √(10.0000 × 29.2000) = 0.9948
For educational and informational purposes only. Verify with a qualified professional.
📈 Statistical Insights
Correlation
— [-1,1]
r²
— Variance
H₀: ρ=0
— Test
Key Takeaways
- • Pearson r: Measures linear correlation. r ∈ [−1, 1]. r = ±1 means perfect linear relationship.
- • Formula: r = [nΣxᵢyᵢ − ΣxᵢΣyᵢ] / √[(nΣxᵢ²−(Σxᵢ)²)(nΣyᵢ²−(Σyᵢ)²)]
- • R² = r²: Coefficient of determination — fraction of variance in y explained by x.
- • t-test: t = r√(n−2)/√(1−r²), df = n−2. Tests H₀: ρ = 0.
- • 95% CI: Fisher z-transform, then back-transform for r.
- • Regression line: y = a + bx where b = r×(sy/sx), a = ȳ − b×x̄.
Did You Know?
How Pearson r is Computed
Step 1: Compute means x̄ and ȳ.
Step 2: For each pair, compute (xᵢ − x̄)(yᵢ − ȳ). Sum to get covariance numerator.
Step 3: Compute Σ(xᵢ − x̄)² and Σ(yᵢ − ȳ)². Multiply and take square root for denominator.
Step 4: r = numerator / denominator. Always between −1 and 1.
Alternative formula: r = Σ((xᵢ−x̄)(yᵢ−ȳ)) / √(Σ(xᵢ−x̄)² × Σ(yᵢ−ȳ)²)
Hypothesis Test and Confidence Interval
t-test for H₀: ρ = 0
t = r√(n−2)/√(1−r²), df = n−2. p-value from t-distribution. Reject H₀ if p < α.
Fisher z-transform for CI
z = 0.5·ln((1+r)/(1−r)). 95% CI: z ± 1.96/√(n−3). Back-transform to get r interval.
Regression Line
The regression line is ŷ = a + bx. Slope b = r×(sy/sx) where sy, sx are standard deviations. Intercept a = ȳ − b×x̄. The line minimizes the sum of squared residuals.
Step-by-Step: Computing r
Step 1: Enter your paired (x, y) data. Compute x̄ = Σx/n and ȳ = Σy/n.
Step 2: For each pair, compute deviations: (xᵢ − x̄) and (yᵢ − ȳ).
Step 3: Compute products (xᵢ−x̄)(yᵢ−ȳ) and sum them for the covariance numerator.
Step 4: Compute Σ(xᵢ−x̄)² and Σ(yᵢ−ȳ)². Multiply and take √ for the denominator.
Step 5: r = numerator / denominator. Check: −1 ≤ r ≤ 1 always.
Residuals and Model Fit
Residuals = observed y − predicted ŷ. A good linear fit has residuals scattered randomly around zero with no pattern. If residuals show a curve (e.g., U-shape), the relationship may be nonlinear — consider transforming variables or using a different model. The sum of squared residuals is minimized by the least-squares regression line.
Interpretation Guide
| |r| | Strength |
|---|---|
| 0 - 0.3 | Weak |
| 0.3 - 0.7 | Moderate |
| 0.7 - 1.0 | Strong |
Frequently Asked Questions
What does a negative correlation mean?
r < 0 means as x increases, y tends to decrease. The relationship is inverse.
How do I interpret the p-value?
p < 0.05 typically means the correlation is statistically significant — unlikely to have occurred by chance if the true correlation were zero.
What is the confidence interval for r?
The 95% CI gives a range of plausible values for the true population correlation. If it includes 0, the correlation may not be significant.
Does correlation imply causation?
No. A high correlation can be due to a third variable, coincidence, or reverse causation. Always consider the study design.
When is Pearson r inappropriate?
When the relationship is nonlinear, data has outliers, or variables are ordinal. Consider Spearman or Kendall.
What is R²?
R² = r². It is the proportion of variance in y explained by the linear relationship with x. R² = 0.64 means 64% explained.
Formulas Reference
r = [nΣxᵢyᵢ − ΣxᵢΣyᵢ] / √[(nΣxᵢ²−(Σxᵢ)²)(nΣyᵢ²−(Σyᵢ)²)]
R² = r²
t = r√(n−2)/√(1−r²), df = n−2
Fisher z: z = 0.5·ln((1+r)/(1−r))
95% CI: z ± 1.96/√(n−3), back-transform
Regression: ŷ = a + bx, b = r×(sy/sx), a = ȳ − b×x̄
When to Use Pearson vs Other Correlations
| Measure | Use When | Assumptions |
|---|---|---|
| Pearson r | Linear relationship, interval/ratio data | Linearity, normality, homoscedasticity |
| Spearman ρ | Monotonic, ordinal data, outliers | None |
| Kendall τ | Small samples, many ties | None |
Official Data Sources
Disclaimer: This calculator provides Pearson correlation analysis for educational purposes. Correlation does not imply causation. Verify results for research or professional use. Uses Abramowitz & Stegun normal CDF approximation (accuracy ≈ 7.5×10⁻⁸).
Related Calculators
Correlation Coefficient Calculator
Compute Pearson's r, Spearman's ρ, and Kendall's τ from paired data. Scatter plot, regression line, residuals, p-value, 95% confidence interval, and R².
StatisticsCovariance Calculator
Compute sample and population covariance from paired data. Shows relationship direction and strength. Covariance matrix for multiple variables. Relation to...
StatisticsConstant of Proportionality Calculator
Find the constant k in y = kx (direct) or y = k/x (inverse) from data. Test whether data follows a proportional relationship with R² goodness of fit.
StatisticsError Propagation Calculator
Propagate uncertainties through mathematical operations: addition, subtraction, multiplication, division, powers, and custom functions. Uses partial...
StatisticsPercentile Rank Calculator
Calculates the percentile rank of a specific value within a dataset. What percent of values fall below a given score? Batch mode, reverse mode, CDF chart.
StatisticsSpearman's Rank Correlation Calculator
Dedicated Spearman's ρ calculator for non-parametric monotonic relationships. Step-by-step computation, hypothesis test, and comparison with Pearson r.
Statistics