Statistics New Zealand has extensively reviewed the 2001 Census confidentiality rules. As a consequence of this review, new confidentiality rules have been introduced for the 2006 Census. Consultation with a range of users helped Statistics New Zealand refine the rules. Following the release of 2006 Census data, the rules have been further refined to better meet user needs.
The 2006 Census confidentiality rules apply to all new requests for census data, including requests for census data from 2001 and earlier.
There are five confidentiality rules, supported by an overarching confidentiality principle. Each rule and the confidentiality principle is explained in this document. The explanations include comparisons with the rules used in 2001, with examples of situations that would pass and fail the 2006 rule.
The most notable changes to the confidentiality rules from 2001 include:
- the reduction in the number of categories of the income variable when data is displayed at the meshblock level
- the replacement of the 2001 'sparse tables rule' with the 2006 'mean cell size rule', with the mean cell size moving from 1 to 2
- the introduction of a threshold to determine if certain cell counts are large enough to be released.
For more information about the rules, please contact info@stats.govt.nz.
top
The five 2006 Census confidentiality rules must be applied in numbered order from 1 to 5. Each rule must be applied to all census data released by Statistics New Zealand. Exemptions to the rules must be documented and approved by the General Manager Census. ‘Geographic area’ is a generic term used to describe the various standard Statistics New Zealand output geographies such as meshblocks and area units. The 2006 Census confidentiality rules are for use with tables of counts from census data. Access to census microdata (unit record data) can be granted to researchers through the Statistics NZ Data Laboratory. Data Laboratory processes protect the microdata, and regulate the output in ways that are equivalent to these rules. |
top
Meshblock data may only be disaggregated (broken down) by one variable. The categories for the selected variable must be at the highest level of aggregation (least detailed) for that classification. Exemptions to rule 1 Standard groupings of meshblocks are exempt from rule 1; that is they can be released with more than one variable and at all levels of the classification. They include: - police stations, police districts, police areas
- district health boards
- wards, electoral boundaries
- area units, regional councils, regional council constituencies, community boards, territorial authorities, urban areas
- statistical areas or other legal or 'government' boundaries, as approved by the census General Manager.
Customised groupings of more than 10 meshblocks are exempt from rule 1; that is they can be released with more than one variable and at all levels of the classification. |
Is this a new rule?
No, this was one of the 2001 Census confidentiality rules.
Explanation
To determine the highest level of the classification, check the Census Data Dictionary. For hierarchical (multiple level) variables, the highest level of the classification (ie least detailed) is referred to as level 1 and more detailed levels of the classification are referred to as levels 2, 3 or 4. Some variables have a flat classification, which means they do not have levels associated with them.
Furthermore, if the classification of the selected variable has more than one level of detail, then the least detailed level must be used for meshblocks. For example, if occupation was selected, the least detailed level has only nine categories, but the most detailed level has hundreds of categories. The least detailed level of the classification must be used.
Applying this rule, table 1 below is allowed but tables 2 and 3 are not allowed. Occupation is a classification with five levels of detail. Table 1 shows the highest level of the occupation category at a meshblock level.
Table 2 fails rule 1 and cannot be released because it is not the highest level of the classification. For tables at meshblock level, the least detailed level of the classification must be used.
Table 3 fails rule 1 and cannot be released because the breakdown is by two variables. A table could be produced for sex or occupation by meshblock but NOT for sex and occupation by meshblock.
Rule 1: Meshblock data example tables
top
Part 1 No income data can be released for any geographic area if the total unrounded subject population* is either: - forty or fewer individuals for personal income, or
- twenty or fewer families, dwellings or households for family and household income.
* The subject population for: - individuals refers to individuals aged 15 years or over
- households and families refers to all households and families in private occupied dwellings.
Part 2 The standard 15-category income classification cannot be used for meshblocks or user-defined combinations of these (see exemptions). Instead, grouped income categories must be used. The five new grouped income classifications are: personal, household, family, extended family and parental income. Exemptions to Part 2 Standard groupings of meshblocks are exempt from rule 2, part 2, and may use the standard 15-category income classification. They include: - police stations, police districts, police areas
- district health boards
- wards, electoral boundaries
- area units, regional councils, regional council constituencies, community boards, territorial authorities, urban areas
- statistical areas or other legal or 'government' boundaries, as approved by the General Manager Census.
Customised groupings of more than 10 meshblocks can use the standard 15-category income classification. Means, medians and percentiles on meshblock data can be calculated using the standard 15-category income classification. |
|
Grouped personal income |
|
Grouped household income and grouped family income |
|
Grouped extended family income and combined parental income for couples with children |
| 1. |
$5,000 or less |
1. |
$20,000 or less |
1. |
$30,000 or less |
| 2. |
$5,001–$10,000 |
2. |
$20,001–$30,000 |
2. |
$30,001–$50,000 |
| 3. |
$10,001–$20,000 |
3. |
$30,001–$50,000 |
3. |
$50,001–$70,000 |
| 4. |
$20,001–$30,000 |
4. |
$50,001–$70,000 |
4. |
$70,001–$100,000 |
| 5. |
$30,001–$50,000 |
5. |
$70,001–$100,000 |
5. |
$100,001 or more |
| 6. |
$50,001 or more |
6. |
$100,001 or more |
9. |
Not stated |
| 9. |
Not stated |
9. |
Not stated |
|
|
Is this a new rule?
Yes, this is a new rule. This rule specifically focuses on income because of the distribution of income; there are relatively few people in the highest and lowest categories of the standard income variable. The categories of the new grouped income variables were determined by aggregating the standard categories until approximately 10 percent or more of the New Zealand subject population fell into each category.
Explanation
Table 4 is an example of how the thresholds in part 1 of rule 2 are applied. Rule 2 stipulates there must be more than 40 individuals in the total unrounded subject population for any income information being released. For area unit 1, there are only 30 people and therefore the income data must be suppressed. This is indicated in the shaded area of table 4.
Table 5 passes rule 2, part 2, because it uses the new grouped personal income classification for meshblock information (and, it can be assumed, has passed rule 2, part 1, where the unrounded subject population is greater than 40 for individuals).
Table 6 fails rule 2, part 2, because it uses the standard income classification for meshblock information. The rules stipulate that the standard classification cannot be used for meshblocks, and the grouped income categories must be used instead.
Table 7 passes rule 2, part 2, because ward is one of the areas for which the standard income classification is allowed. (It can be assumed that table 7 has passed rule 2, part 1 where the unrounded subject population is greater than 40). If preferred, the grouped income classification could be used instead.
Rule 2: Income data example tables
top
The mean cell size for individual geographic areas must be greater than two. Income information is still subject to rule 2. Mean cell size = total unrounded subject population in a geographic area number of categories (excluding totals) in that same geographic area The number of categories in the denominator is determined by the number of categories in the standard census variables. Any table will contain marginal tables*, which will appear as groups of totals embedded in the main table. Each of these tables will need to be assessed against this rule as a table in its own right. Often the main table will fail the rule but the marginal tables will pass it. A separate mean cell size calculation is required for each marginal table. This additional calculation ensures that any table is treated consistently, whether it appears in a larger table or on its own. * A marginal table uses combinations of variables from the main table. For example, in a three-dimensional table for one geographic area and the variables age, sex and ethnicity, there could be two-dimensional tables with that geographic area and each of: age and sex, sex and ethnicity, and age and ethnicity. There could also be one-dimensional tables for that geographic area and each of: age, sex, and ethnicity. Each of these two-dimensional and one-dimensional tables are the marginal tables of the three-dimensional table. Some tables which contain only a geography and the population total for that geography are excluded from the mean cell size rule. These tables are of census usually resident population count, census night population count, family, dwelling and household counts. New Zealand is the default geography for tables which do not specify another geography. In these tables the total unrounded subject population is divided by the number of internal cells in the table. If a table fails, the threshold is applied (see exemption for large cells). Exemption to rule 3 for large cells When the mean cell size rule suppresses large cells, or the rule cannot be applied to output, a threshold may be applied instead. Cells at or below the threshold value of 5 are suppressed, and higher counts are released. |
Is this a new rule?
This rule is similar to one used in 2001, but is applied differently and produces a different result. In 2001, this rule was applied to the overall table, but in 2006 this rule is applied at the finest geographic level of the table. The result of this change is that some information that may have passed the 2001 rules will fail the 2006 rules. However, the threshold will release the counts greater than five in these tables.
Explanation
The essence of rule 3 is to check that there are more than twice as many people as categories in each geographic area of the table.
Table 8 has six categories (excluding the total) for each area unit, so there must be more than 12 people to release the information for each area unit. The mean cell size is calculated as follows:
Mean cell size = total unrounded subject population in a geographic area number of categories (excluding totals) in that same geographic area
Mean cell size for area unit 1 = 1,749 = 291.50 6
291.50 is greater than 2, and therefore the information for area unit 1 can be released.
Mean cell size for area unit 5 = 9 = 1.50 6
1.50 is less than 2, and therefore the information for area unit 5 must be subjected to the threshold process, and counts over 5 can be released rounded. Similar results occur for area units 8 and 12.
The mean cell size and threshold operate on raw counts, before random rounding is applied.
Table 9 below shows how the mean cell size rule is applied when a second variable is added to the table. The formula remains the same as that used in table 8. It is important to note that the calculation of the number of categories in the table, ie the denominator, excludes totals. In table 9, legal marital status has six categories and sex has two categories, so the calculation for the number of cells is 6 x 2 = 12. Rule 3 stipulates that you must have more than twice as many people as categories. Therefore, there needs to be more than 24 people in the subject population to release the small counts for legal marital status and sex.
Eight of the 12 area units have enough people (over 24) for the two-way table for sex by legal marital status to pass the rule.
Area unit 3 has enough people (21, which is over 12) for both marginal tables, for sex by legal marital status separately, to pass the rule.
Area units 5 and 12 have enough people (9 and 12 respectively, which are both over 4) for the marginal table for sex to pass the rule.
Area unit 8 has three people, so both marginal tables fail the rule.
The threshold process means a 'c' that represents counts above 5, in any parts of the example, can be replaced by rounded counts.
Rule 3: Mean cell size example tables
top
| All counts for individuals, families, households and dwellings must be randomly rounded to base 3. After each count has been rounded, all totals must then be separately randomly rounded to base 3. |
Is this a new rule?
No, this was one of the 2001 Census confidentiality rules.
Explanation
Rounding to base 3 means that all numbers must be divisible by three; base 3 numbers include 0, 3, 6, 9, 12 and so on. Numbers that are not divisible by three are randomly allocated to either the higher or the lower base 3 number. For example, the number 4 could become either a 3 or a 6 in the table. A number appearing as a 3 could originally have been a 1, 2, 3, 4 or 5.
The totals in the table are not additive, meaning that if all the columns or rows in the table are added, they will not equal the total given. This is because each total is rounded independently of the other numbers in the table, so if the total was originally 100, it could appear as either 99 or 102. It will never be more than two digits higher or lower than the original number.
top
| All derivations from counts (percentages and ratios, for example) must be derived from the randomly rounded counts. This excludes totals and subtotals, as these are independently randomly rounded. Once a derivation has been calculated, there will be no further random rounding applied to the derived data. |
Is this a new rule?
No, this was one of the 2001 Census confidentiality rules.
Explanation
When calculating the percentage of people in New Zealand who were under 15 years of age on census night, for example, the numbers (ie numerator and denominator) must first be rounded to base 3 before they can be used to calculate the percentage.
Once the calculation has been completed, that percentage is not rounded to base 3 but it may be rounded for clarity. For example, 4.6667 percent may be rounded to 4.7 or 5 percent, but it need not be rounded to 3 or 6 percent.
top
Under the Statistics Act 1975, Statistics New Zealand employees are required to withhold any output that may identify the characteristics of a particular person or undertaking. If, despite Rules 1–5, there is any reason to suspect that an output may identify the characteristics of a particular person or undertaking, the data should not be released and advice must be sought from the General Manager Census.