AP Statistics Unit 2 Test: Exploring Two-Variable Data
Sharpen your AP Statistics Unit 2 skills with practice on scatterplots, correlation, LSRL slope interpretation, residuals, and regression FRQ writing.
What Unit 2 Covers in AP Statistics
Unit 2 extends statistical description to relationships between two quantitative variables. The central tools — scatterplots, correlation, and the least-squares regression line — appear heavily on both the multiple-choice and free-response sections of the AP Statistics exam.
Scatterplots and Describing Associations
When describing a scatterplot, address direction (positive or negative association), form (linear or nonlinear), strength (strong, moderate, weak), and any unusual features such as outliers or influential points — always in the context of the variables being studied.
Correlation
The correlation coefficient r measures the strength and direction of a linear association between two quantitative variables. Key facts: r has no units, r is always between −1 and 1, and r does not change when you switch x and y or when you apply a linear transformation to either variable. A common AP pitfall is interpreting r as implying causation — correlation alone does not establish a cause-and-effect relationship.
The Least-Squares Regression Line
The LSRL minimizes the sum of squared residuals. The slope and y-intercept each carry specific AP-required interpretations. The slope is interpreted as: for each one-unit increase in x, the predicted value of y increases (or decreases) by the slope value, on average. The y-intercept is interpreted in context only when x = 0 is meaningful for the data.
Residuals and Residual Plots
A residual equals the observed value minus the predicted value. A residual plot that shows no pattern — random scatter around zero — indicates that a linear model is appropriate. A curved pattern in the residual plot suggests a nonlinear model would fit the data better. This distinction is a frequent FRQ focus.
Key AP FRQ Skills for Regression Analysis
- Writing a complete interpretation of slope in context with the phrase 'on average'
- Explaining what r-squared means: the proportion of variability in y that is explained by the linear relationship with x
- Identifying and describing influential points and outliers in regression
- Assessing whether a linear model is appropriate using a residual plot
- Using the LSRL equation to make predictions and recognizing the risk of extrapolation
Regression FRQ Patterns on the AP Exam
Interpreting Computer Output
AP Statistics FRQs frequently present regression output in a table format with columns for coefficients, standard errors, t-statistics, and p-values. You need to correctly extract the slope and y-intercept, write the LSRL equation, and interpret the slope — all without a graphing calculator doing the labeling for you.
Avoiding the Extrapolation Trap
Using a regression equation to predict a y-value for an x-value far outside the range of the original data is called extrapolation, and AP exams regularly ask students to recognize its danger. Predictions made through extrapolation may be unreliable because the linear relationship may not continue beyond the observed data range.
Frequently asked questions
Related
- Unit 1 Exploring One Variable Data
- Unit 3 Collecting Data
- Unit 4 Probability Random Variables and Probability Distributions
- Unit 5 Sampling Distributions
- Unit 6 Inference for Categorical Data Proportions
- Unit 7 Inference for Quantitative Data Means
- Unit 8 Inference for Categorical Data Chi Square
- Unit 9 Inference for Quantitative Data Slopes