AP Statistics106 views·Updated Jul 10, 2026·13 pages

Understanding Least-Squares Regression

Chelsea@zuncho

Add to Google sources

Ready to master linear regression? This summary breaks down how...

1 / 10

of 10

# 3.2 Least-Squares Regression Notes

Learning Objectives
-Interpret the slope and y intercept of a least-squares regression line.
-Use the

Least-Squares Regression Basics

Regression lines help you predict how a response variable changes as an explanatory variable changes. The equation for a regression line is ŷ = a + bx, where:

ŷ (pronounced "y-hat") is your predicted value of y for a given x value
b is the slope, showing how much y changes when x increases by one unit
a is the y-intercept, the predicted value of y when x = 0

Remember the difference between y and ŷ: y is the actual value from your data, while ŷ is what your regression line predicts.

When analyzing car prices, for example, you might find that price decreases as mileage increases—a perfect application for regression analysis!

💡 Think of regression as your crystal ball—it uses known data points to make predictions about unknown values, but it's only as good as the relationship between your variables.

of 10

Interpreting Regression and Avoiding Extrapolation

Let's look at a real example. For used Ford F-150 trucks, the regression line is: price = 38,257 - 0.1629(miles driven)

Interpreting the slope: For every additional mile driven, the predicted price decreases by $0.1629.

Interpreting the y-intercept: A truck with 0 miles would cost approximately $38,257.

To predict the price of a truck with 100,000 miles: price = 38,257 - 0.1629(100,000) = $21,967

However, if we try to predict for 250,000 miles: price = 38,257 - 0.1629(250,000) = -$2,468

Wait—a negative price makes no sense! This illustrates the danger of extrapolation—using the regression line to predict values outside the range of your original data. The relationship might not remain linear beyond your data points.

When making predictions, stay within the range of values used to create your regression line. Otherwise, your predictions may become wildly inaccurate.

of 10

Interpreting Regression Lines and Finding Them with Technology

When analyzing a track and field example where ŷ = 305 - 27.6x (x = sprint time, y = long jump distance), the slope tells us that for each additional second in sprint time, we predict the long jump distance to decrease by about 27.6 inches.

Sometimes interpreting the y-intercept doesn't make sense. For instance, a sprint time of 0 seconds is impossible, so we wouldn't interpret that value.

To find a regression line using your calculator:

Enter data into lists L1 and L2
Go to STAT → CALC → 4: LinReg
The calculator will give you both the regression equation and the r-value

For 11 used Honda CR-Vs, we found:

A strong negative correlation $r = -0.8$
The regression equation: price = -86.182(miles) + 18,773.284

This means for every additional thousand miles driven, the predicted price decreases by $86.18, and a theoretical CR-V with 0 miles would cost about$ 18,773.

💡 Always sketch your scatterplot first to make sure the relationship looks linear before calculating a regression line!

of 10

Residuals: Measuring Prediction Accuracy

A residual is the difference between what actually happened and what your line predicted: residual = actual y - predicted y = y - ŷ

The formula is easy to remember with "AP: actual minus predicted."

Positive residuals mean the actual value is above the line (the prediction was too low), while negative residuals mean the actual value is below the line (the prediction was too high).

Let's calculate a residual for a Ford F-150 with 59,000 miles and an actual price of $32,000:

Predicted price (ŷ) = 38,257 - 0.1629(59,000) = $28,645.90
Residual = $32,000 -$ 28,645.90 = $3,354.10

This means this particular truck is selling for $3,354.10 more than our regression line predicted. Maybe it has special features or is in exceptionally good condition!

Residuals help us see how accurately our regression line fits the actual data points. The closer the residuals are to zero, the better our predictions.

of 10

Least-Squares Method and Finding the Best Fit Line

What makes a regression line the "best" fit? The least-squares regression line is the one that makes the sum of the squared residuals as small as possible.

Why square the residuals? This emphasizes larger errors and ensures positive and negative errors don't cancel each other out. For example, a residual of -4,765 becomes a squared residual of 22,705,225.

When analyzing Taco Bell's chicken menu items (comparing fat to carbs), we found:

The regression line: y = 1.799x + 16.062 (where x = fat in grams, y = carbs in grams)
Interpretation: For every additional gram of fat, we predict carbs to increase by 1.799 grams
A chicken item with no fat would have about 16.062 grams of carbs

For the Chicken Burrito Supreme with 12g of fat and 50g of carbs:

Predicted carbs (ŷ) = 1.799(12) + 16.062 = 37.65g
Residual = 50g - 37.65g = 12.35g

This item has 12.35 more grams of carbs than our line predicted!

💡 The least-squares line isn't resistant to outliers—unusual data points can drastically change your regression line!

of 10

Residual Plots: Checking If Your Line Fits

A residual plot graphs residuals against the explanatory variable $x$ , helping you see if a linear model is appropriate for your data.

What to look for in a residual plot:

GOOD: Random scatter around the zero line means the linear model fits well
BAD: Curved pattern suggests the relationship isn't linear
BAD: Changing vertical spread indicates predictions will be less accurate for some values of x

Creating a residual plot on your calculator:

Enter data into lists
Find the regression line: STAT → CALC → 4:LinReg $ax+b$
Set up the plot: STAT PLOT → Type: scatter → Xlist: explanatory variable → Ylist: RESID (found under 2nd STAT)
View the graph with ZOOM 9 (ZoomStat)

A good residual plot will look like a random cloud of points scattered evenly above and below the horizontal line where residual = 0.

Think of residual plots as your regression lie detector—they reveal patterns that might be hidden in the original scatterplot and tell you whether your linear model is trustworthy.

of 10

Measuring Fit: Standard Deviation of Residuals and r²

Two key statistics tell us how well our regression line fits the data:

Standard deviation of the residuals $s$ :
- Formula: s = √[Σresiduals²/ $n-2$ ]
- Interpretation: The "typical" prediction error when using the regression line
- For the Taco Bell data: s = 11.31g, meaning the typical prediction error when using fat to predict carbs is about 11.31 grams
Coefficient of determination (r²):
- Shows what percentage of the variation in y is explained by the regression line
- Formula: r² = 1 - (Σresiduals²)/ $Σ(y-ȳ)²$
- r² = $r$ ², where r is the correlation coefficient
- Interpretation: "___% of the variation in (response) is accounted for by the LSRL relating (explanatory) and (response)"

Important relationships:

As r² approaches 1, s approaches 0 (better fit)
s has the same units as the response variable
r² has no units and always falls between 0 and 1

💡 A higher r² means your model explains more of the variation in your data, making it a stronger predictor!

of 10

Interpreting r², s, and Computer Output

For the Taco Bell data, r² = 0.7105, meaning 71.05% of the variation in carbohydrates is accounted for by the relationship between fat and carbs in Taco Bell chicken items.

In another example with college football teams, the regression line ŷ = -3.75 + 0.437x relates points scored per game $x$ to wins $y$ :

The standard deviation of residuals $s$ = 1.24, meaning the typical prediction error is about 1.24 wins
r² = 0.88, meaning 88% of the variation in wins is accounted for by points scored per game
Residual plots show no patterns, confirming a linear model is appropriate

When reading computer output from statistical software (like Minitab or JMP), you need to locate:

Slope (coefficient of the explanatory variable)
y-intercept (constant or intercept)
Standard deviation of residuals (s or Root Mean Square Error)
Coefficient of determination (R-Sq or R²)

Learning to interpret these values helps you evaluate how well your model fits the data and how reliable your predictions will be.

of 10

Analyzing Real-World Regression Examples

For roller coasters, we analyzed the relationship between height (x in feet) and speed (y in mph), finding:

Regression line: ŷ = 0.24x + 25
Slope: As height increases by 1 foot, predicted speed increases by 0.24 mph
r = 0.93 $from √r² = √0.86$ , indicating a strong positive linear relationship
Residual plots show no patterns, confirming a linear model fits well

For Mr. Freeze, a roller coaster 218 ft tall with actual speed of 70 mph:

Predicted speed (ŷ) = 0.24(218) + 25 = 77.32 mph
Residual = 70 - 77.32 = -7.32 mph (actual speed is 7.32 mph slower than predicted)

The statistics tell us:

r² = 0.86: 86% of variation in speed is explained by height
s = 6.54 mph: Typical prediction error is 6.54 mph

Interesting note: Changing units (mph to km/h) would not affect r² but would increase s because the values of y would be larger.

💡 Be cautious about extrapolation! We wouldn't use this model to predict speed for a 500-foot roller coaster since that's outside our data range.

of 10

Calculating Regression Lines from Summary Statistics

You can calculate the regression line without the original data if you know:

Means (x̄ and ȳ)
Standard deviations (sₓ and sᵧ)
Correlation $r$

The formulas are:

Slope: b = r(sᵧ/sₓ)
y-intercept: a = ȳ - bx̄

For example, with foot length $x$ and height $y$ data:

x̄ = 24.76 cm, sₓ = 2.71 cm
ȳ = 171.43 cm, sᵧ = 10.69 cm
r = 0.697

Slope: b = 0.697 $10.69/2.71$ = 2.75 y-intercept: a = 171.43 - 2.75(24.76) = 103.34

So the regression line is: ŷ = 2.75x + 103.34

This means for every additional centimeter in foot length, we predict height to increase by 2.75 centimeters.

Remember: The regression line always passes through the point (x̄, ȳ), and for each increase of 1 standard deviation in x, the predicted y increases by r times the standard deviation of y.

We thought you’d never ask...

Our AI companion is specifically built for the needs of students. Based on the millions of content pieces we have on the platform we can provide truly meaningful and relevant answers to students. But its not only about answers, the companion is even more about guiding students through their daily learning challenges, with personalised study plans, quizzes or content pieces in the chat and 100% personalisation based on the students skills and developments.

You can download the app in the Google Play Store and in the Apple App Store.

That's right! Enjoy free access to study content, connect with fellow students, and get instant help – all at your fingertips.

Students love us — and so will you.

4.6/5App Store

4.7/5Google Play

The app is very easy to use and well designed. I have found everything I was looking for so far and have been able to learn a lot from the presentations! I will definitely use the app for a class assignment! And of course it also helps a lot as an inspiration.

Stefan SiOS user

This app is really great. There are so many study notes and help [...]. My problem subject is French, for example, and the app has so many options for help. Thanks to this app, I have improved my French. I would recommend it to anyone.

Samantha KlichAndroid user

Wow, I am really amazed. I just tried the app because I've seen it advertised many times and was absolutely stunned. This app is THE HELP you want for school and above all, it offers so many things, such as workouts and fact sheets, which have been VERY helpful to me personally.

AnnaiOS user

AP Statistics106 views·Updated Jul 10, 2026·13 pages

Understanding Least-Squares Regression

Chelsea@zuncho

Add to Google sources

Ready to master linear regression? This summary breaks down how to find, interpret, and evaluate regression lines that show relationships between variables. You'll learn practical skills for predicting values, understanding how well your predictions work, and avoiding common pitfalls.

of 10

Least-Squares Regression Basics

Regression lines help you predict how a response variable changes as an explanatory variable changes. The equation for a regression line is ŷ = a + bx, where:

ŷ (pronounced "y-hat") is your predicted value of y for a given x value
b is the slope, showing how much y changes when x increases by one unit
a is the y-intercept, the predicted value of y when x = 0

Remember the difference between y and ŷ: y is the actual value from your data, while ŷ is what your regression line predicts.

When analyzing car prices, for example, you might find that price decreases as mileage increases—a perfect application for regression analysis!

💡 Think of regression as your crystal ball—it uses known data points to make predictions about unknown values, but it's only as good as the relationship between your variables.

of 10

Interpreting Regression and Avoiding Extrapolation

Let's look at a real example. For used Ford F-150 trucks, the regression line is: price = 38,257 - 0.1629(miles driven)

Interpreting the slope: For every additional mile driven, the predicted price decreases by $0.1629.

Interpreting the y-intercept: A truck with 0 miles would cost approximately $38,257.

To predict the price of a truck with 100,000 miles: price = 38,257 - 0.1629(100,000) = $21,967

However, if we try to predict for 250,000 miles: price = 38,257 - 0.1629(250,000) = -$2,468

When making predictions, stay within the range of values used to create your regression line. Otherwise, your predictions may become wildly inaccurate.

of 10

Interpreting Regression Lines and Finding Them with Technology

Sometimes interpreting the y-intercept doesn't make sense. For instance, a sprint time of 0 seconds is impossible, so we wouldn't interpret that value.

To find a regression line using your calculator:

Enter data into lists L1 and L2
Go to STAT → CALC → 4: LinReg
The calculator will give you both the regression equation and the r-value

For 11 used Honda CR-Vs, we found:

A strong negative correlation $r = -0.8$
The regression equation: price = -86.182(miles) + 18,773.284

This means for every additional thousand miles driven, the predicted price decreases by $86.18, and a theoretical CR-V with 0 miles would cost about$ 18,773.

💡 Always sketch your scatterplot first to make sure the relationship looks linear before calculating a regression line!

of 10

Residuals: Measuring Prediction Accuracy

A residual is the difference between what actually happened and what your line predicted: residual = actual y - predicted y = y - ŷ

The formula is easy to remember with "AP: actual minus predicted."

Positive residuals mean the actual value is above the line (the prediction was too low), while negative residuals mean the actual value is below the line (the prediction was too high).

Let's calculate a residual for a Ford F-150 with 59,000 miles and an actual price of $32,000:

Predicted price (ŷ) = 38,257 - 0.1629(59,000) = $28,645.90
Residual = $32,000 -$ 28,645.90 = $3,354.10

This means this particular truck is selling for $3,354.10 more than our regression line predicted. Maybe it has special features or is in exceptionally good condition!

Residuals help us see how accurately our regression line fits the actual data points. The closer the residuals are to zero, the better our predictions.

of 10

Least-Squares Method and Finding the Best Fit Line

What makes a regression line the "best" fit? The least-squares regression line is the one that makes the sum of the squared residuals as small as possible.

When analyzing Taco Bell's chicken menu items (comparing fat to carbs), we found:

The regression line: y = 1.799x + 16.062 (where x = fat in grams, y = carbs in grams)
Interpretation: For every additional gram of fat, we predict carbs to increase by 1.799 grams
A chicken item with no fat would have about 16.062 grams of carbs

For the Chicken Burrito Supreme with 12g of fat and 50g of carbs:

Predicted carbs (ŷ) = 1.799(12) + 16.062 = 37.65g
Residual = 50g - 37.65g = 12.35g

This item has 12.35 more grams of carbs than our line predicted!

💡 The least-squares line isn't resistant to outliers—unusual data points can drastically change your regression line!

of 10

Residual Plots: Checking If Your Line Fits

A residual plot graphs residuals against the explanatory variable $x$ , helping you see if a linear model is appropriate for your data.

What to look for in a residual plot:

GOOD: Random scatter around the zero line means the linear model fits well
BAD: Curved pattern suggests the relationship isn't linear
BAD: Changing vertical spread indicates predictions will be less accurate for some values of x

Creating a residual plot on your calculator:

Enter data into lists
Find the regression line: STAT → CALC → 4:LinReg $ax+b$
Set up the plot: STAT PLOT → Type: scatter → Xlist: explanatory variable → Ylist: RESID (found under 2nd STAT)
View the graph with ZOOM 9 (ZoomStat)

A good residual plot will look like a random cloud of points scattered evenly above and below the horizontal line where residual = 0.

Think of residual plots as your regression lie detector—they reveal patterns that might be hidden in the original scatterplot and tell you whether your linear model is trustworthy.

of 10

Measuring Fit: Standard Deviation of Residuals and r²

Two key statistics tell us how well our regression line fits the data:

Standard deviation of the residuals $s$ :
- Formula: s = √[Σresiduals²/ $n-2$ ]
- Interpretation: The "typical" prediction error when using the regression line
- For the Taco Bell data: s = 11.31g, meaning the typical prediction error when using fat to predict carbs is about 11.31 grams
Coefficient of determination (r²):
- Shows what percentage of the variation in y is explained by the regression line
- Formula: r² = 1 - (Σresiduals²)/ $Σ(y-ȳ)²$
- r² = $r$ ², where r is the correlation coefficient
- Interpretation: "___% of the variation in (response) is accounted for by the LSRL relating (explanatory) and (response)"

Important relationships:

As r² approaches 1, s approaches 0 (better fit)
s has the same units as the response variable
r² has no units and always falls between 0 and 1

💡 A higher r² means your model explains more of the variation in your data, making it a stronger predictor!

of 10

Interpreting r², s, and Computer Output

For the Taco Bell data, r² = 0.7105, meaning 71.05% of the variation in carbohydrates is accounted for by the relationship between fat and carbs in Taco Bell chicken items.

In another example with college football teams, the regression line ŷ = -3.75 + 0.437x relates points scored per game $x$ to wins $y$ :

The standard deviation of residuals $s$ = 1.24, meaning the typical prediction error is about 1.24 wins
r² = 0.88, meaning 88% of the variation in wins is accounted for by points scored per game
Residual plots show no patterns, confirming a linear model is appropriate

When reading computer output from statistical software (like Minitab or JMP), you need to locate:

Slope (coefficient of the explanatory variable)
y-intercept (constant or intercept)
Standard deviation of residuals (s or Root Mean Square Error)
Coefficient of determination (R-Sq or R²)

Learning to interpret these values helps you evaluate how well your model fits the data and how reliable your predictions will be.

of 10

Analyzing Real-World Regression Examples

For roller coasters, we analyzed the relationship between height (x in feet) and speed (y in mph), finding:

Regression line: ŷ = 0.24x + 25
Slope: As height increases by 1 foot, predicted speed increases by 0.24 mph
r = 0.93 $from √r² = √0.86$ , indicating a strong positive linear relationship
Residual plots show no patterns, confirming a linear model fits well

For Mr. Freeze, a roller coaster 218 ft tall with actual speed of 70 mph:

Predicted speed (ŷ) = 0.24(218) + 25 = 77.32 mph
Residual = 70 - 77.32 = -7.32 mph (actual speed is 7.32 mph slower than predicted)

The statistics tell us:

r² = 0.86: 86% of variation in speed is explained by height
s = 6.54 mph: Typical prediction error is 6.54 mph

Interesting note: Changing units (mph to km/h) would not affect r² but would increase s because the values of y would be larger.

💡 Be cautious about extrapolation! We wouldn't use this model to predict speed for a 500-foot roller coaster since that's outside our data range.

of 10

Calculating Regression Lines from Summary Statistics

You can calculate the regression line without the original data if you know:

Means (x̄ and ȳ)
Standard deviations (sₓ and sᵧ)
Correlation $r$

The formulas are:

Slope: b = r(sᵧ/sₓ)
y-intercept: a = ȳ - bx̄

For example, with foot length $x$ and height $y$ data:

x̄ = 24.76 cm, sₓ = 2.71 cm
ȳ = 171.43 cm, sᵧ = 10.69 cm
r = 0.697

Slope: b = 0.697 $10.69/2.71$ = 2.75 y-intercept: a = 171.43 - 2.75(24.76) = 103.34

So the regression line is: ŷ = 2.75x + 103.34

This means for every additional centimeter in foot length, we predict height to increase by 2.75 centimeters.

Remember: The regression line always passes through the point (x̄, ȳ), and for each increase of 1 standard deviation in x, the predicted y increases by r times the standard deviation of y.

We thought you’d never ask...

You can download the app in the Google Play Store and in the Apple App Store.

That's right! Enjoy free access to study content, connect with fellow students, and get instant help – all at your fingertips.

Students love us — and so will you.

4.6/5App Store

4.7/5Google Play

Stefan SiOS user

Samantha KlichAndroid user

AnnaiOS user

Least-Squares Regression Basics

Interpreting Regression and Avoiding Extrapolation

Interpreting Regression Lines and Finding Them with Technology

Residuals: Measuring Prediction Accuracy

Least-Squares Method and Finding the Best Fit Line

Residual Plots: Checking If Your Line Fits

Measuring Fit: Standard Deviation of Residuals and r²

Interpreting r², s, and Computer Output

Analyzing Real-World Regression Examples

Calculating Regression Lines from Summary Statistics

We thought you’d never ask...

What is the Knowunity AI companion?

Where can I download the Knowunity app?

Is Knowunity really free of charge?

Similar Content

Triangle Congruence

6.5 and 6.6 Proportionality

Proving Triangles Similar: SSS~ and SAS~

Finding Coterminal Angles

Introduction to Conic Sections

Triangle Congruencee by SSS and SAS

Most popular content in AP Statistics

Introduction to Bivariate Quantitative Data

Hypothesis Testing Project

Most popular content

Origins and Dynamics of the Columbian Exchange

Introduction to Early Cultural Interactions

Origins of Ancient River Civilizations

Motivations for European Exploration

Foundations of Ethical Guidelines in Research

Introduction to Native American Societies

Introduction to the Spanish Encomienda System

Introduction to Biological Elements of Life

Origins of the Articles of Confederation

Students love us — and so will you.

Sign up to see the content. It's free!

Least-Squares Regression Basics

Sign up to see the content. It's free!

Interpreting Regression and Avoiding Extrapolation

Sign up to see the content. It's free!

Interpreting Regression Lines and Finding Them with Technology

Sign up to see the content. It's free!

Residuals: Measuring Prediction Accuracy

Sign up to see the content. It's free!

Least-Squares Method and Finding the Best Fit Line

Sign up to see the content. It's free!

Residual Plots: Checking If Your Line Fits

Sign up to see the content. It's free!

Measuring Fit: Standard Deviation of Residuals and r²

Sign up to see the content. It's free!

Interpreting r², s, and Computer Output

Sign up to see the content. It's free!

Analyzing Real-World Regression Examples

Sign up to see the content. It's free!

Calculating Regression Lines from Summary Statistics

We thought you’d never ask...

What is the Knowunity AI companion?

Where can I download the Knowunity app?

Is Knowunity really free of charge?

Similar Content

Triangle Congruence

6.5 and 6.6 Proportionality

Proving Triangles Similar: SSS~ and SAS~

Finding Coterminal Angles

Introduction to Conic Sections

Triangle Congruencee by SSS and SAS

Most popular content in AP Statistics

Introduction to Bivariate Quantitative Data

Hypothesis Testing Project

Most popular content

Origins and Dynamics of the Columbian Exchange

Introduction to Early Cultural Interactions

Origins of Ancient River Civilizations

Motivations for European Exploration

Foundations of Ethical Guidelines in Research

Introduction to Native American Societies

Introduction to the Spanish Encomienda System

Introduction to Biological Elements of Life

Origins of the Articles of Confederation

Students love us — and so will you.