Ready to master linear regression? This summary breaks down how... Show more
Understanding Least-Squares Regression











Least-Squares Regression Basics
Regression lines help you predict how a response variable changes as an explanatory variable changes. The equation for a regression line is ŷ = a + bx, where:
- ŷ is your predicted value of y for a given x value
- b is the slope, showing how much y changes when x increases by one unit
- a is the y-intercept, the predicted value of y when x = 0
Remember the difference between y and ŷ: y is the actual value from your data, while ŷ is what your regression line predicts.
When analyzing car prices, for example, you might find that price decreases as mileage increases—a perfect application for regression analysis!
💡 Think of regression as your crystal ball—it uses known data points to make predictions about unknown values, but it's only as good as the relationship between your variables.

Interpreting Regression and Avoiding Extrapolation
Let's look at a real example. For used Ford F-150 trucks, the regression line is: price = 38,257 - 0.1629(miles driven)
Interpreting the slope: For every additional mile driven, the predicted price decreases by $0.1629.
Interpreting the y-intercept: A truck with 0 miles would cost approximately $38,257.
To predict the price of a truck with 100,000 miles: price = 38,257 - 0.1629(100,000) = $21,967
However, if we try to predict for 250,000 miles: price = 38,257 - 0.1629(250,000) = -$2,468
Wait—a negative price makes no sense! This illustrates the danger of extrapolation—using the regression line to predict values outside the range of your original data. The relationship might not remain linear beyond your data points.
When making predictions, stay within the range of values used to create your regression line. Otherwise, your predictions may become wildly inaccurate.

Interpreting Regression Lines and Finding Them with Technology
When analyzing a track and field example where ŷ = 305 - 27.6x , the slope tells us that for each additional second in sprint time, we predict the long jump distance to decrease by about 27.6 inches.
Sometimes interpreting the y-intercept doesn't make sense. For instance, a sprint time of 0 seconds is impossible, so we wouldn't interpret that value.
To find a regression line using your calculator:
- Enter data into lists L1 and L2
- Go to STAT → CALC → 4: LinReg
- The calculator will give you both the regression equation and the r-value
For 11 used Honda CR-Vs, we found:
- A strong negative correlation
- The regression equation: price = -86.182(miles) + 18,773.284
This means for every additional thousand miles driven, the predicted price decreases by $86.18, and a theoretical CR-V with 0 miles would cost about $18,773.
💡 Always sketch your scatterplot first to make sure the relationship looks linear before calculating a regression line!

Residuals: Measuring Prediction Accuracy
A residual is the difference between what actually happened and what your line predicted: residual = actual y - predicted y = y - ŷ
The formula is easy to remember with "AP: actual minus predicted."
Positive residuals mean the actual value is above the line (the prediction was too low), while negative residuals mean the actual value is below the line (the prediction was too high).
Let's calculate a residual for a Ford F-150 with 59,000 miles and an actual price of $32,000:
- Predicted price (ŷ) = 38,257 - 0.1629(59,000) = $28,645.90
- Residual = $32,000 - $28,645.90 = $3,354.10
This means this particular truck is selling for $3,354.10 more than our regression line predicted. Maybe it has special features or is in exceptionally good condition!
Residuals help us see how accurately our regression line fits the actual data points. The closer the residuals are to zero, the better our predictions.

Least-Squares Method and Finding the Best Fit Line
What makes a regression line the "best" fit? The least-squares regression line is the one that makes the sum of the squared residuals as small as possible.
Why square the residuals? This emphasizes larger errors and ensures positive and negative errors don't cancel each other out. For example, a residual of -4,765 becomes a squared residual of 22,705,225.
When analyzing Taco Bell's chicken menu items (comparing fat to carbs), we found:
- The regression line: y = 1.799x + 16.062
- Interpretation: For every additional gram of fat, we predict carbs to increase by 1.799 grams
- A chicken item with no fat would have about 16.062 grams of carbs
For the Chicken Burrito Supreme with 12g of fat and 50g of carbs:
- Predicted carbs (ŷ) = 1.799(12) + 16.062 = 37.65g
- Residual = 50g - 37.65g = 12.35g
This item has 12.35 more grams of carbs than our line predicted!
💡 The least-squares line isn't resistant to outliers—unusual data points can drastically change your regression line!

Residual Plots: Checking If Your Line Fits
A residual plot graphs residuals against the explanatory variable (x), helping you see if a linear model is appropriate for your data.
What to look for in a residual plot:
- GOOD: Random scatter around the zero line means the linear model fits well
- BAD: Curved pattern suggests the relationship isn't linear
- BAD: Changing vertical spread indicates predictions will be less accurate for some values of x
Creating a residual plot on your calculator:
- Enter data into lists
- Find the regression line: STAT → CALC → 4:LinReg
- Set up the plot: STAT PLOT → Type: scatter → Xlist: explanatory variable → Ylist: RESID (found under 2nd STAT)
- View the graph with ZOOM 9 (ZoomStat)
A good residual plot will look like a random cloud of points scattered evenly above and below the horizontal line where residual = 0.
Think of residual plots as your regression lie detector—they reveal patterns that might be hidden in the original scatterplot and tell you whether your linear model is trustworthy.

Measuring Fit: Standard Deviation of Residuals and r²
Two key statistics tell us how well our regression line fits the data:
-
Standard deviation of the residuals (s):
- Formula: s = √
- Interpretation: The "typical" prediction error when using the regression line
- For the Taco Bell data: s = 11.31g, meaning the typical prediction error when using fat to predict carbs is about 11.31 grams
-
Coefficient of determination (r²):
- Shows what percentage of the variation in y is explained by the regression line
- Formula: r² = 1 - (Σresiduals²)/
- r² = (r)², where r is the correlation coefficient
- Interpretation: "___% of the variation in (response) is accounted for by the LSRL relating (explanatory) and (response)"
Important relationships:
- As r² approaches 1, s approaches 0 (better fit)
- s has the same units as the response variable
- r² has no units and always falls between 0 and 1
💡 A higher r² means your model explains more of the variation in your data, making it a stronger predictor!

Interpreting r², s, and Computer Output
For the Taco Bell data, r² = 0.7105, meaning 71.05% of the variation in carbohydrates is accounted for by the relationship between fat and carbs in Taco Bell chicken items.
In another example with college football teams, the regression line ŷ = -3.75 + 0.437x relates points scored per game (x) to wins (y):
- The standard deviation of residuals (s) = 1.24, meaning the typical prediction error is about 1.24 wins
- r² = 0.88, meaning 88% of the variation in wins is accounted for by points scored per game
- Residual plots show no patterns, confirming a linear model is appropriate
When reading computer output from statistical software (like Minitab or JMP), you need to locate:
- Slope (coefficient of the explanatory variable)
- y-intercept (constant or intercept)
- Standard deviation of residuals (s or Root Mean Square Error)
- Coefficient of determination
Learning to interpret these values helps you evaluate how well your model fits the data and how reliable your predictions will be.

Analyzing Real-World Regression Examples
For roller coasters, we analyzed the relationship between height (x in feet) and speed (y in mph), finding:
- Regression line: ŷ = 0.24x + 25
- Slope: As height increases by 1 foot, predicted speed increases by 0.24 mph
- r = 0.93 , indicating a strong positive linear relationship
- Residual plots show no patterns, confirming a linear model fits well
For Mr. Freeze, a roller coaster 218 ft tall with actual speed of 70 mph:
- Predicted speed (ŷ) = 0.24(218) + 25 = 77.32 mph
- Residual = 70 - 77.32 = -7.32 mph (actual speed is 7.32 mph slower than predicted)
The statistics tell us:
- r² = 0.86: 86% of variation in speed is explained by height
- s = 6.54 mph: Typical prediction error is 6.54 mph
Interesting note: Changing units would not affect r² but would increase s because the values of y would be larger.
💡 Be cautious about extrapolation! We wouldn't use this model to predict speed for a 500-foot roller coaster since that's outside our data range.

Calculating Regression Lines from Summary Statistics
You can calculate the regression line without the original data if you know:
- Means (x̄ and ȳ)
- Standard deviations (sₓ and sᵧ)
- Correlation (r)
The formulas are:
- Slope: b = r
- y-intercept: a = ȳ - bx̄
For example, with foot length (x) and height (y) data:
- x̄ = 24.76 cm, sₓ = 2.71 cm
- ȳ = 171.43 cm, sᵧ = 10.69 cm
- r = 0.697
Slope: b = 0.697(10.69/2.71) = 2.75 y-intercept: a = 171.43 - 2.75(24.76) = 103.34
So the regression line is: ŷ = 2.75x + 103.34
This means for every additional centimeter in foot length, we predict height to increase by 2.75 centimeters.
Remember: The regression line always passes through the point (x̄, ȳ), and for each increase of 1 standard deviation in x, the predicted y increases by r times the standard deviation of y.
We thought you’d never ask...
What is the Knowunity AI companion?
Our AI companion is specifically built for the needs of students. Based on the millions of content pieces we have on the platform we can provide truly meaningful and relevant answers to students. But its not only about answers, the companion is even more about guiding students through their daily learning challenges, with personalised study plans, quizzes or content pieces in the chat and 100% personalisation based on the students skills and developments.
Where can I download the Knowunity app?
You can download the app in the Google Play Store and in the Apple App Store.
Is Knowunity really free of charge?
That's right! Enjoy free access to study content, connect with fellow students, and get instant help – all at your fingertips.
Most popular content in AP Statistics
3Most popular content
9Can't find what you're looking for? Explore other subjects.
Students love us — and so will you.
The app is very easy to use and well designed. I have found everything I was looking for so far and have been able to learn a lot from the presentations! I will definitely use the app for a class assignment! And of course it also helps a lot as an inspiration.
This app is really great. There are so many study notes and help [...]. My problem subject is French, for example, and the app has so many options for help. Thanks to this app, I have improved my French. I would recommend it to anyone.
Wow, I am really amazed. I just tried the app because I've seen it advertised many times and was absolutely stunned. This app is THE HELP you want for school and above all, it offers so many things, such as workouts and fact sheets, which have been VERY helpful to me personally.
Understanding Least-Squares Regression
Ready to master linear regression? This summary breaks down how to find, interpret, and evaluate regression lines that show relationships between variables. You'll learn practical skills for predicting values, understanding how well your predictions work, and avoiding common pitfalls.

Sign up to see the content. It's free!
- Access to all documents
- Improve your grades
- Join milions of students
Least-Squares Regression Basics
Regression lines help you predict how a response variable changes as an explanatory variable changes. The equation for a regression line is ŷ = a + bx, where:
- ŷ is your predicted value of y for a given x value
- b is the slope, showing how much y changes when x increases by one unit
- a is the y-intercept, the predicted value of y when x = 0
Remember the difference between y and ŷ: y is the actual value from your data, while ŷ is what your regression line predicts.
When analyzing car prices, for example, you might find that price decreases as mileage increases—a perfect application for regression analysis!
💡 Think of regression as your crystal ball—it uses known data points to make predictions about unknown values, but it's only as good as the relationship between your variables.

Sign up to see the content. It's free!
- Access to all documents
- Improve your grades
- Join milions of students
Interpreting Regression and Avoiding Extrapolation
Let's look at a real example. For used Ford F-150 trucks, the regression line is: price = 38,257 - 0.1629(miles driven)
Interpreting the slope: For every additional mile driven, the predicted price decreases by $0.1629.
Interpreting the y-intercept: A truck with 0 miles would cost approximately $38,257.
To predict the price of a truck with 100,000 miles: price = 38,257 - 0.1629(100,000) = $21,967
However, if we try to predict for 250,000 miles: price = 38,257 - 0.1629(250,000) = -$2,468
Wait—a negative price makes no sense! This illustrates the danger of extrapolation—using the regression line to predict values outside the range of your original data. The relationship might not remain linear beyond your data points.
When making predictions, stay within the range of values used to create your regression line. Otherwise, your predictions may become wildly inaccurate.

Sign up to see the content. It's free!
- Access to all documents
- Improve your grades
- Join milions of students
Interpreting Regression Lines and Finding Them with Technology
When analyzing a track and field example where ŷ = 305 - 27.6x , the slope tells us that for each additional second in sprint time, we predict the long jump distance to decrease by about 27.6 inches.
Sometimes interpreting the y-intercept doesn't make sense. For instance, a sprint time of 0 seconds is impossible, so we wouldn't interpret that value.
To find a regression line using your calculator:
- Enter data into lists L1 and L2
- Go to STAT → CALC → 4: LinReg
- The calculator will give you both the regression equation and the r-value
For 11 used Honda CR-Vs, we found:
- A strong negative correlation
- The regression equation: price = -86.182(miles) + 18,773.284
This means for every additional thousand miles driven, the predicted price decreases by $86.18, and a theoretical CR-V with 0 miles would cost about $18,773.
💡 Always sketch your scatterplot first to make sure the relationship looks linear before calculating a regression line!

Sign up to see the content. It's free!
- Access to all documents
- Improve your grades
- Join milions of students
Residuals: Measuring Prediction Accuracy
A residual is the difference between what actually happened and what your line predicted: residual = actual y - predicted y = y - ŷ
The formula is easy to remember with "AP: actual minus predicted."
Positive residuals mean the actual value is above the line (the prediction was too low), while negative residuals mean the actual value is below the line (the prediction was too high).
Let's calculate a residual for a Ford F-150 with 59,000 miles and an actual price of $32,000:
- Predicted price (ŷ) = 38,257 - 0.1629(59,000) = $28,645.90
- Residual = $32,000 - $28,645.90 = $3,354.10
This means this particular truck is selling for $3,354.10 more than our regression line predicted. Maybe it has special features or is in exceptionally good condition!
Residuals help us see how accurately our regression line fits the actual data points. The closer the residuals are to zero, the better our predictions.

Sign up to see the content. It's free!
- Access to all documents
- Improve your grades
- Join milions of students
Least-Squares Method and Finding the Best Fit Line
What makes a regression line the "best" fit? The least-squares regression line is the one that makes the sum of the squared residuals as small as possible.
Why square the residuals? This emphasizes larger errors and ensures positive and negative errors don't cancel each other out. For example, a residual of -4,765 becomes a squared residual of 22,705,225.
When analyzing Taco Bell's chicken menu items (comparing fat to carbs), we found:
- The regression line: y = 1.799x + 16.062
- Interpretation: For every additional gram of fat, we predict carbs to increase by 1.799 grams
- A chicken item with no fat would have about 16.062 grams of carbs
For the Chicken Burrito Supreme with 12g of fat and 50g of carbs:
- Predicted carbs (ŷ) = 1.799(12) + 16.062 = 37.65g
- Residual = 50g - 37.65g = 12.35g
This item has 12.35 more grams of carbs than our line predicted!
💡 The least-squares line isn't resistant to outliers—unusual data points can drastically change your regression line!

Sign up to see the content. It's free!
- Access to all documents
- Improve your grades
- Join milions of students
Residual Plots: Checking If Your Line Fits
A residual plot graphs residuals against the explanatory variable (x), helping you see if a linear model is appropriate for your data.
What to look for in a residual plot:
- GOOD: Random scatter around the zero line means the linear model fits well
- BAD: Curved pattern suggests the relationship isn't linear
- BAD: Changing vertical spread indicates predictions will be less accurate for some values of x
Creating a residual plot on your calculator:
- Enter data into lists
- Find the regression line: STAT → CALC → 4:LinReg
- Set up the plot: STAT PLOT → Type: scatter → Xlist: explanatory variable → Ylist: RESID (found under 2nd STAT)
- View the graph with ZOOM 9 (ZoomStat)
A good residual plot will look like a random cloud of points scattered evenly above and below the horizontal line where residual = 0.
Think of residual plots as your regression lie detector—they reveal patterns that might be hidden in the original scatterplot and tell you whether your linear model is trustworthy.

Sign up to see the content. It's free!
- Access to all documents
- Improve your grades
- Join milions of students
Measuring Fit: Standard Deviation of Residuals and r²
Two key statistics tell us how well our regression line fits the data:
-
Standard deviation of the residuals (s):
- Formula: s = √
- Interpretation: The "typical" prediction error when using the regression line
- For the Taco Bell data: s = 11.31g, meaning the typical prediction error when using fat to predict carbs is about 11.31 grams
-
Coefficient of determination (r²):
- Shows what percentage of the variation in y is explained by the regression line
- Formula: r² = 1 - (Σresiduals²)/
- r² = (r)², where r is the correlation coefficient
- Interpretation: "___% of the variation in (response) is accounted for by the LSRL relating (explanatory) and (response)"
Important relationships:
- As r² approaches 1, s approaches 0 (better fit)
- s has the same units as the response variable
- r² has no units and always falls between 0 and 1
💡 A higher r² means your model explains more of the variation in your data, making it a stronger predictor!

Sign up to see the content. It's free!
- Access to all documents
- Improve your grades
- Join milions of students
Interpreting r², s, and Computer Output
For the Taco Bell data, r² = 0.7105, meaning 71.05% of the variation in carbohydrates is accounted for by the relationship between fat and carbs in Taco Bell chicken items.
In another example with college football teams, the regression line ŷ = -3.75 + 0.437x relates points scored per game (x) to wins (y):
- The standard deviation of residuals (s) = 1.24, meaning the typical prediction error is about 1.24 wins
- r² = 0.88, meaning 88% of the variation in wins is accounted for by points scored per game
- Residual plots show no patterns, confirming a linear model is appropriate
When reading computer output from statistical software (like Minitab or JMP), you need to locate:
- Slope (coefficient of the explanatory variable)
- y-intercept (constant or intercept)
- Standard deviation of residuals (s or Root Mean Square Error)
- Coefficient of determination
Learning to interpret these values helps you evaluate how well your model fits the data and how reliable your predictions will be.

Sign up to see the content. It's free!
- Access to all documents
- Improve your grades
- Join milions of students
Analyzing Real-World Regression Examples
For roller coasters, we analyzed the relationship between height (x in feet) and speed (y in mph), finding:
- Regression line: ŷ = 0.24x + 25
- Slope: As height increases by 1 foot, predicted speed increases by 0.24 mph
- r = 0.93 , indicating a strong positive linear relationship
- Residual plots show no patterns, confirming a linear model fits well
For Mr. Freeze, a roller coaster 218 ft tall with actual speed of 70 mph:
- Predicted speed (ŷ) = 0.24(218) + 25 = 77.32 mph
- Residual = 70 - 77.32 = -7.32 mph (actual speed is 7.32 mph slower than predicted)
The statistics tell us:
- r² = 0.86: 86% of variation in speed is explained by height
- s = 6.54 mph: Typical prediction error is 6.54 mph
Interesting note: Changing units would not affect r² but would increase s because the values of y would be larger.
💡 Be cautious about extrapolation! We wouldn't use this model to predict speed for a 500-foot roller coaster since that's outside our data range.

Sign up to see the content. It's free!
- Access to all documents
- Improve your grades
- Join milions of students
Calculating Regression Lines from Summary Statistics
You can calculate the regression line without the original data if you know:
- Means (x̄ and ȳ)
- Standard deviations (sₓ and sᵧ)
- Correlation (r)
The formulas are:
- Slope: b = r
- y-intercept: a = ȳ - bx̄
For example, with foot length (x) and height (y) data:
- x̄ = 24.76 cm, sₓ = 2.71 cm
- ȳ = 171.43 cm, sᵧ = 10.69 cm
- r = 0.697
Slope: b = 0.697(10.69/2.71) = 2.75 y-intercept: a = 171.43 - 2.75(24.76) = 103.34
So the regression line is: ŷ = 2.75x + 103.34
This means for every additional centimeter in foot length, we predict height to increase by 2.75 centimeters.
Remember: The regression line always passes through the point (x̄, ȳ), and for each increase of 1 standard deviation in x, the predicted y increases by r times the standard deviation of y.
We thought you’d never ask...
What is the Knowunity AI companion?
Our AI companion is specifically built for the needs of students. Based on the millions of content pieces we have on the platform we can provide truly meaningful and relevant answers to students. But its not only about answers, the companion is even more about guiding students through their daily learning challenges, with personalised study plans, quizzes or content pieces in the chat and 100% personalisation based on the students skills and developments.
Where can I download the Knowunity app?
You can download the app in the Google Play Store and in the Apple App Store.
Is Knowunity really free of charge?
That's right! Enjoy free access to study content, connect with fellow students, and get instant help – all at your fingertips.
Most popular content in AP Statistics
3Most popular content
9Can't find what you're looking for? Explore other subjects.
Students love us — and so will you.
The app is very easy to use and well designed. I have found everything I was looking for so far and have been able to learn a lot from the presentations! I will definitely use the app for a class assignment! And of course it also helps a lot as an inspiration.
This app is really great. There are so many study notes and help [...]. My problem subject is French, for example, and the app has so many options for help. Thanks to this app, I have improved my French. I would recommend it to anyone.
Wow, I am really amazed. I just tried the app because I've seen it advertised many times and was absolutely stunned. This app is THE HELP you want for school and above all, it offers so many things, such as workouts and fact sheets, which have been VERY helpful to me personally.