About the birth cohort SURF
The 2006 Census birth cohort SURF is a synthetic unit-record file (SURF) based on data for newborns (defined as those aged 0 years) and their parents from the 2006 Census of Population and Dwellings. The file contains 10,000 records on the newborn child, their household, their mother (if applicable), and their father (if applicable).
Synthetic data should not be used as a source of accurate statistical information, but it is a largely realistic representation of a portion of the New Zealand population and can be used for teaching and learning purposes. It can also be used as a source of unit-record data for developing analytical methods or statistical processes.
This work was undertaken by the Centre of Methods and Policy Application in the Social Sciences (COMPASS) at the University of Auckland, supported by a grant from the Ministry of Business, Innovation and Employment.
Dataset and data dictionary
Download the 2006 Census birth cohort SURF dataset (which can be used in any data analysis software package) and data dictionary (with about the variables in the SURF) from 'Available files' above. If you have problems viewing the files, see Opening files and PDFS.
About the 2006 Census of Population and Dwellings
The census is the official count of population and dwellings in New Zealand. The census provides a unique source of detailed demographic, social, and economic data relating to the entire population at a single point in time. The census is taken every five years.
The 2006 Census covers all dwellings in New Zealand on Tuesday 7 March 2006 and every man, woman, child, and baby alive in New Zealand on 7 March 2006 who was:
- on New Zealand soil
- on a vessel in New Zealand waters, or
- on a passage between New Zealand ports.
Using the birth cohort SURF
The 2006 Census birth cohort SURF is a source of unit-record data (n=10,000) that can be used for teaching and learning purposes, or for developing analytical methods or statistical processes.
Properties of the birth cohort SURF
All users should note the following properties of the birth cohort SURF.
The birth cohort SURF contains 10,000 records based on 10,000 randomly selected records from 2006 Census data on newborns (0-year-olds) and their families. This represents approximately 19 percent of all families with newborns in the 2006 Census).
A sample of 10,000 was chosen to ensure that each major ethnic and socio-economic groups is adequately represented. This sample size is also small enough to allow memory-intensive processes (eg, simulation) to be undertaken without drastically affecting processing speed.
The SURF was created by determining the statistical distance between each of the 10,000 randomly selected records from 2006 Census data, using key demographic and family characteristics. Each record was then ‘matched’ with the two records found to be most similar to them (ie, the records with the least statistical distance) to form a sample of ‘clusters’, each containing three records. The SURF (n=10,000) was then derived by forming a ‘composite’ record from each cluster, with each characteristic for each composite record randomly chosen from the characteristics of the three records in the cluster.
While the data are not real, key statistical measures such as mean and variance are similar to those in the real data. Relationships between the variables and the distributions of each variable also imitate the real data.
Interpreting the variables
NZDep2006 is an index of deprivation calculated by researchers from the Department of Public Health, University of Otago, Wellington (UOW). This index is not calculated by Statistics NZ. The index scores meshblocks (the smallest geographical areas used for statistical purposes) by combining census variables associated with deprivation (eg unemployed, not living in own home, no access to a telephone).
In this SURF, NZDep2006 decile scores (a number 1–10 on an ordinal scale) are used with 1 for the areas with the least deprived scores and 10 for the areas with the most deprived scores.
People can identify with multiple ethnicities in the census. Up to six are recorded. In the SURF, each response is output as a top level (level 1) response. For example, Chinese and Indian are output as Asian, and Tongan or Fijian are output as Pacific peoples. This is done for the child and their parent or parents. A newborn's ethnicity response is decided by whoever completes their census form.
Ethnicity is different from nationality or race. See the ethnicity standard for more information.
Income, benefits, and labour force status
The labour force status variable (Wklfs_Code) is based on official classifications, and may not align with rules for benefits or typical expectations. The census counts someone of working age (15 years and older) as employed if they worked for at least one hour in the reference week (ending 5 March 2006). This work could have been for pay or profit, or without pay if for a family-owned farm or business.
See the census definitions for more information. This is the same official definition used in the Household Labour Force Survey (HLFS). For example: an unemployment beneficiary could be working a few hours a week, and therefore classified as employed by the census.
See Key differences between the officially unemployed, registered job seekers and recipients of Unemployment Benefits for a comparison of our official definitions and criteria used by the Ministry of Social Development.
The dataset contains information about the parents of each newborn. If there is only one parent, information on the first or second parent will be missing in the dataset. The information on a parent may not be the biological parent, but a grandparent, foster parent, or other person. This is recorded in the Family_Role_Code variable.
The following variables are included in the birth cohort SURF, and more detail is included in the data dictionary file (accessible as an available file at the top of this page). The categorical variables in the birth cohort SURF file are stored as numeric codes rather than character text, so you’ll need to consult the data dictionary file in order to fully interpret the data.
- Random_ID: A random 5-digit ID number, unique for each record
- NZDep2006: Scores on the NZDep2006 Scale (1–10)
- Ch_Asian: Ethnicity of the child: Asian
- Ch_Euro: Ethnicity of the child: European
- Ch_Maori: Ethnicity of the child: Māori
- Ch_Melaa: Ethnicity of the child: Middle Eastern, Latin American and African
- Ch_Other: Ethnicity of the child: Other
- Ch_Pacific: Ethnicity of the child: Pacific
- M_Age: Mother’s age (in years)
- M_Asian: Ethnicity of the child: Asian
- M_Euro: Ethnicity of the mother: European
- M_Maori: Ethnicity of the mother: Māori
- M_Melaa: Ethnicity of the mother: Middle Eastern, Latin American and African
- M_Other: Ethnicity of the mother: Other
- M_Pacific: Ethnicity of the mother: Pacific
- M_Wklfs_Code: Work and labour force status of the mother (four categories)
- D_Age: Father’s age (in years)
- D_Asian: Ethnicity of the father: Asian
- D_Euro: Ethnicity of the father: European
- D_Maori: Ethnicity of the father: Māori
- D_Melaa: Ethnicity of the father: Middle Eastern, Latin American and African
- D_Other: Ethnicity of the father: Other
- D_Pacific: Ethnicity of the father: Pacific
- D_Wklfs_Code: Work and labour force status of the father (4 categories)
- Bedroom_Count_Code: Number of bedrooms in the dwelling
- M_Years_At_Addr_Code: Number of years lived at current address: Mother
- M_Tenure_Holder_Code: Tenure status of the mother (dwelling owned or not owned)
- M_Education: Mother’s education: 15 categories
- M_Work_Hours: Number of hours worked per week by the mother
- M_Smoke: Mother’s smoking (current, ex, never)
- M_Income_Srce7: Mother receives Unemployment benefit
- M_Income_Srce8: Mother receives Sickness benefit
- M_Income_Srce9: Mother receives Domestic Purposes benefit
- M_Income_Srce10: Mother receives Invalid’s benefit
- M_Income_Srce11: Mother receives Student Allowance
- M_Income_Srce12: Mother receives Domestic Purposes benefit
- D_Years_At_Addr_Code: Number of years lived at current address: Father
- D_Tenure_Holder_Code: Tenure status of the father (dwelling owned or not owned)
- D_Education: Father’s education: 15 categories
- D_Work_Hours: Number of hours worked per week by the father
- D_Smoke: Father’s smoking (current, ex, never)
- D_Income_Srce7: Father receives Unemployment benefit
- D_Income_Srce8: Father receives Sickness benefit
- D_Income_Srce9: Father receives Domestic Purposes benefit
- D_Income_Srce10: Father receives Invalid’s benefit
- D_Income_Srce11: Father receives Student Allowance
- D_Income_Srce12: Father receives Domestic Purposes benefit
- Ch_Sex: Male or female
- Child_Depend_Family_Type_Code: Structure of family (8 categories)
- D_Family_Role_Code: Relationship to the child of person in father role
- Heat_Fuel: heating used in the dwelling (some or none)
- M_Family_Role_Code: Relationship to the child of person in mother role
- Telecomm1_Code: Access to cellphone/mobile phone
- Telecomm2_Code: Access to telephone
- Telecomm4_Code: Access to internet
- Total_Income_Hhld_Code: Household income (14 categories from loss to >$100,000)
- Usual_Resdnt_Count_Code: Number usually resident in the household
- Twin: Singleton or twin
- Singparstat: Single parent status (two-parent family, single mother family, single father family)
Page updated 4 November 2013