Statistics NZ > Find info for secondary > Teachers > Drift to the north - teachers page

Drift to the north – level 3 teachers page

Secondary activities 

Drift to the north activity


Curriculum links

NCEA Mathematics Achievement Standard AS90645

  • Select and analyse continuous bivariate data

Mathematics: Statistics strand – level 8

  • Investigate relationships between two continuous variables, using graphical methods (including linear regression), calculate correlation coefficients to estimate the strength of linear relations, and discuss the appropriateness of any regression line or correlation.

Background

This activity uses census data to look at changes in population in two territorial authorities, one in the north and one in the south. It asks students to examine the theory of the drift north using their analysis. At each stage students are required to think about the validity of the process.

A spreadsheet of the data is supplied below. This data is not suitable for use in an assessment as it is not continuous and does not give a choice of variables to students.

Part 1: Changes in the south

1.  The cells with ‘C’ in have had their population counts suppressed for reasons of confidentiality.  Because they have no values, they are omitted from the graph. It would be incorrect to give them values of zero and include them in the graph because this would distort the results. A zero count, however, is a valid response and should remain in the dataset.

2.  There are two clusters of data. The cluster with lower values represents area units in the remote country areas and the other shows small towns or more densely settled rural areas. There appear to be two possible outliers. The first is West Gore, which had a much higher population in 2001 than the other areas. Gore is the largest town in this area (as Invercargill is not included) and it makes sense that it has a higher population. This is unlikely to actually be an outlier as it looks to be close to the pattern of the other data. The other is Te Anau, which is a popular tourist destination. It shows the largest growth in population. The number of tourists has grown over the years and more of the people counted in Te Anau in 2001 are likely to be visitors rather than residents. A linear regression line looks to be the best option as the shape of the data is not really curved.






Population counts  for 1991 and 2001 in the Southland and Gore territorial authorities.



3.  The 0.58 shows that the population is declining (because it is less than 1). For every 15 to 19 year old in the area unit in 1991, there is only 0.58 of a 25 to 29 year old in 2001. More people have moved away or died than have moved into, or been born in, the area.

4.  The residuals show the two clusters in the data mentioned before. There is no other obvious pattern in the data. The outliers in the data are also shown in the plot.

5.  The outlier for Gore is quite close to the model and should not be removed. It shows the same decline in population as the other areas do. Te Anau could be removed as it is difficult to be sure how many of the population shown are permanent residents. It is in a different category from the other area units, by being a popular tourist destination. Numbers may include overseas visitors as well as casual workers. If the population count was replaced by the numbers of permanent residents it would give a clearer picture, and then it would be hard to justify its removal. Some of the other area units which show increases are also tourist areas (eg Milford).

6.  The coefficient of determination (R2 ) is 0.4876. This shows that only about 49 percent of the variation in the area unit population in 2001 is explained by the regression model. The correlation coefficient is 0.70. This shows there is a reasonable linear relationship between the numbers in 1991 and the numbers in 2001.

7.  The removal of the outlier improves the model (see below). It now explains 75 percent of the variation. This means that predictions are likely to be more accurate.



Population counts for 1991 and 2001 in th Southland and Gore territorial authorities.



Part 2: Changes in the north




Population counts for 1991 and 2001 in the Waitakere territorial authority.



1.  The data shown in a scatter plot with a linear model (above).

2.  Both datasets show that the linear model is a reasonable one which explains over 70 percent of the variation. There is very good correlation (more than 0.85) between the data for 1991 and for 2001 in both datasets. The slope of the regression line for Waitakere shows a slight increase in population over the 10 years as the slope is slightly more than 3.  For Southland and Gore a large decrease is shown.

4.  The numbers in the area units are larger overall in Waitakere.

 (a)  Because we are looking at movements in the population, it makes sense to try to use the same cohort to see if those people are still there after the 10 years. However, there are lots of other things which make this less valid. Many people leave school between age 16 and 19 and they often go elsewhere to study. Many will not go back afterwards. In addition, it is unlikely that the 15 to 19 year olds will still be living in the same area unit by the next census, even if they stayed in the area. The 25 to 29 year olds are likely to be largely a different group who have moved into the area. So the changes may be just for that age group not the population in general. We are comparing two completely different things and this makes our analysis somewhat meaningless.

 (b)  If we used the same age range in each census they would be different people and we would have to assume that the percentage of that age group in the population stayed the same. However, this would probably show more realistic results for this age group.

5. In Southland there are very few areas above the y = x line. This means that few places increased in population over the 10 years. In Waitakere there are almost equal numbers of increases and decreases. 

6.  The high correlation shows that there is a strong linear relationship in both datasets. Most area units with a small population in 1991 still had a small one in 2001, and units with a large population in 1991 were still large in 2001. So this is likely to contribute to the number in the later census.  However, the reason for the number of people in the area unit is likely to be related to its location. For example, city locations are likely to have more people than rural locations, where the population is more spread out. 

7.  More 15 to 19 year olds died or left Southland and Gore than came in. There was a small increase in Waitakere. However, there is no evidence that the people who left Southland went to Waitakere. Possibly more relevant is that we are comparing a rural area with a city area. Many young people go to the city after they leave school and this is likely to be more significant than the direction of the city.

It would make more sense to compare like with like. A city in the south with one in the north, for example. Or two rural areas. Choosing suburbs with the same characteristics could be useful. For example, two close to the university area.

Choosing a different age group might be more useful. People with families are often less likely to move around. The same cohort is then more likely to have more of the same people in it.

Looking at the total population rather than a subset would also be useful.
 

Related links

Similar data can be obtained from Table Builder using this link.

A copy of the raw data used in this activity is available here: Drift to the north.xls (37 KB)

If you do not own a copy of Microsoft Excel 97 or higher, you can download the Excel File Viewer from Microsoft for free. Excel File Viewer lets you view, print and copy tables downloaded in Excel format. You may use this to export these files to another spreadsheet application.