Curriculum links
NCEA Mathematics Achievement Standard AS90645
- Select and analyse continuous bi-variate data.
Mathematics and Statistics: Statistics strand – level 8
– Carry out investigations of phenomena, using the statistical enquiry cycle by:
- conducting surveys, and using existing data sets
- finding, using, and assessing appropriate models (including linear regression for bi-variate data and additive models for time-series data), seeking explanations, and making predictions
- using informed contextual knowledge, exploratory data analysis, and statistical
- inferencecommunicating findings and evaluating all stages of the cycle.
Background
Older and Wiser? is a practice activity for AS90645. It asks students to investigate whether debt increases as we get older. Students need to check the data to use the debt variable and discuss the adequacy of the model beyond the range of data used here.
Part 1
An intentional outlier has been added to the debt variable. Students will need to look at the data, decide whether they should remove the extreme value and clean where appropriate. Reasons for removing may include that it:
- is a mistake
- affects the analysis of the whole sample

Outlier details
Sex |
Employment |
Qulaification |
Ethnicity |
Age |
AgeP |
Income |
Wages Salary |
Debt |
Networth |
Male |
Lab |
Vocation |
European |
46 |
48 |
99,000 |
42,000 |
1,098,000 |
366,000 |
Removing the data point will alter the graph, improving the R2 of your model. The outlier debt value can be predicted using the regression model and added (grey point on graph). The regression equation suggests that our debt increases with age. Students may be able to come up with possible explanations for this (eg increase in people having a mortgage).
It may also be important to note that the model may only be appropriate for the age range used: debt could decrease for ages above 50 which would mean that after a certain age we do 'get wiser'.
Recalculated value for outlier
| Sex |
Employment |
Qulaification |
Ethnicity |
Age |
AgeP |
Income |
Wages Salary |
Debt |
Networth |
| Male |
Lab |
Vocation |
European |
46 |
48 |
99,000 |
42,000 |
69,452 |
366,000 |
Part 2
Confidence interval of total debt for partnered, not partnered, and the difference (no partner – partner).
|
Sample size |
Mean total debt ($) |
Standard error ($) |
Lower limit |
Upper limit |
| No partner |
133 |
23,421.00 |
2,123.70 |
19,220.00 |
27,622.00 |
| Partnered |
167 |
56,216.00 |
2,584.90 |
51,112.00 |
61,319.00 |
| Differnece |
|
-39,607.00 |
3,461.90 |
-32,795.00 |
-25,982.00 |
The confidence intervals for each group do not overlap. This suggests that there is a difference between debt for those who live with partners and those that don't.
The regression equations indicate that people who live with partners seem to have higher debt as they get older.
Students may want to consider whether linear regression is appropriate for the analysis, how does the data differ from the models, what are the residuals are doing, and what this all implies.