Least Squares Regression
The line of best fit minimizes the sum of squared vertical distances (residuals). Slope a and intercept b give ŷ = ax + b. R² = r² is the proportion of variance explained.
Did our AI summary help? Let us know.
The regression line passes through (x̄, ȳ)—the centroid. R² = 0.75 means 75% of y variation explained by x. Correlation does not imply causation.
Ready to run the numbers?
Why: Regression predicts y from x—sales from ads, weight from height, grades from study time.
How: Enter x,y pairs; the calculator finds slope and intercept that minimize Σ(y − ŷ)².
Run the calculator when you are ready.
Least Squares Regression — Line of Best Fit
Enter X and Y values. Get slope, intercept, R², Pearson r, predicted values, residuals, and charts.
📊 Quick Examples — Click to Load
Inputs
Data Points & Predictions
| X | Y (Actual) | ŷ (Predicted) | Residual |
|---|---|---|---|
| 1.0000 | 2.0000 | 1.8582 | 0.1418 |
| 2.0000 | 3.9000 | 3.8830 | 0.0170 |
| 3.0000 | 6.2000 | 5.9079 | 0.2921 |
| 4.0000 | 7.8000 | 7.9327 | -0.1327 |
| 5.0000 | 9.5000 | 9.9576 | -0.4576 |
| 6.0000 | 11.9000 | 11.9824 | -0.0824 |
| 7.0000 | 14.1000 | 14.0073 | 0.0927 |
| 8.0000 | 15.8000 | 16.0321 | -0.2321 |
| 9.0000 | 18.2000 | 18.0570 | 0.1430 |
| 10.0000 | 20.3000 | 20.0818 | 0.2182 |
Scatter Plot & Regression Line
Residuals (Observed − Predicted)
📐 Calculation Breakdown
For educational and informational purposes only. Verify with a qualified professional.
🧮 Fascinating Math Facts
The least squares line passes through the centroid (x̄, ȳ).
R² = 0.75 means 75% of variation in y is explained by the linear relationship.
📋 Key Takeaways
- • Least squares minimizes the sum of squared vertical distances from points to the line
- • Slope (a) = change in y per unit change in x; intercept (b) = y when x = 0
- • Pearson r (−1 to +1) measures linear correlation; R² = proportion of variance explained
- • Residuals = observed − predicted; good fit has residuals scattered randomly around zero
💡 Did You Know?
📖 How It Works
Enter X and Y values (comma or space separated). The calculator computes n, Σx, Σy, Σxy, Σx², Σy², then slope a = (nΣxy − ΣxΣy) / (nΣx² − (Σx)²) and intercept b = (Σy − a·Σx) / n.
Slope Formula
a = (n·Σxy − Σx·Σy) / (n·Σx² − (Σx)²). The denominator Δ must be non-zero (X must vary).
Intercept Formula
b = (Σy − a·Σx) / n = ȳ − a·x̄. The line always passes through (x̄, ȳ).
Pearson r and R²
r = (nΣxy − ΣxΣy) / √[(nΣx²−(Σx)²)(nΣy²−(Σy)²)]. R² = r². R² is the proportion of variance in y explained by x.
🎯 Expert Tips
Check Linearity
Plot your data first. If the relationship is curved, linear regression may be inappropriate. Consider transformations.
Residual Plot
Residuals should be randomly scattered. Patterns (funnel, curve) suggest heteroscedasticity or non-linearity.
Sample Size
At least 10–30 points recommended. Small samples can produce misleading R² and unstable coefficients.
Outliers
Extreme points can pull the line. Verify they are not data errors. Consider robust regression for outlier-heavy data.
📊 Comparison Table
| |r| Range | Strength | R² Interpretation |
|---|---|---|
| 0.9 – 1.0 | Very strong | 90–100% variance explained |
| 0.7 – 0.9 | Strong | 49–81% variance explained |
| 0.5 – 0.7 | Moderate | 25–49% variance explained |
| 0.3 – 0.5 | Weak | 9–25% variance explained |
| 0.0 – 0.3 | Very weak | <9% variance explained |
❓ FAQ
What is the difference between correlation and causation?
Correlation measures statistical association. Causation means one variable directly causes another. A strong correlation does not imply causation — a third variable may explain both.
When should I use linear regression?
When the relationship between X and Y appears approximately linear, and you want to predict Y from X or quantify the strength of the relationship.
What are the assumptions of linear regression?
Linearity, independence of errors, homoscedasticity (constant error variance), and normality of errors. Check residuals to validate.
How do I interpret the slope?
Slope = change in Y per 1-unit increase in X. A slope of 2.5 means Y increases by 2.5 when X increases by 1.
What does R² mean?
R² (coefficient of determination) is the proportion of variance in Y explained by X. R² = 0.80 means 80% of the variation in Y is explained by the linear relationship.
How do I enter my data?
Enter X values in one box and Y values in another, in the same order. Use commas, spaces, or newlines. Both must have the same count.
📊 Infographic Stats
📚 Official Sources
⚠️ Disclaimer: This calculator is for educational purposes. Verify critical analyses with professional statistical software when making decisions.
Related Calculators
Gamma Function Calculator
Gamma Function Calculator - Calculate and learn about statistics concepts
MathematicsScatter Plot Calculator
Scatter Plot Calculator - Calculate and learn about statistics concepts
MathematicsStem and Leaf Plot Calculator
Create stem-and-leaf plots from your data. Enter values (comma or space separated), choose stem unit, optionally compare two datasets with back-to-back mode....
MathematicsZ-Score Calculator
Calculate z-scores from a raw value (with mean and standard deviation) or from a dataset. Get percentile rank, probability (area under normal curve), and...
MathematicsPercentile Calculator
Calculate any percentile from a dataset. Supports linear interpolation and nearest rank methods. Get full percentile table (every 5th), rank position, and...
MathematicsDescriptive Statistics Calculator
All-in-one statistics calculator. Compute count, min, max, range, sum, mean, median, mode, variance, standard deviation, coefficient of variation, quartiles...
Mathematics