Accurate calculation of a Gini index using SAS and R

## Statistics New Zealand Working Paper No 17–02

### Vic Duoba and Nairn MacGibbon

#### Abstract

##### Motivation and objectives

The objective is to identify a suitable algorithm for calculating a Gini (1912) index (coefficient) for a variable defined within a dataset. No attempt is made to broadly review the very extensive literature on the Gini index, nor to discuss statistical estimation issues arising from using a sample to estimate a population-defined Gini value. The approach taken here is that the Gini index is calculated conditionally on the dataset available. For recent work on a range of statistical issues, see Creedy (2015).
There are a number of commonly used formulae for calculating the index and it is not immediately apparent which to use. If care is not taken to implement a correct algorithm, then the bias in the calculated index might be of material importance, especially for small to medium-sized datasets. Given the current frequency of income inequality discussions in economically developed countries, it is important to clarify how Stats NZ should calculate Gini indexes from its data.

##### Methods

Definitions of the Gini index, as they apply to a finite, discrete population, were obtained from various sources, along with the source code for Stats NZ and Ministry of Justice computational procedures.
The fundamental definition for the Gini index was an ‘area distance’ measure. This (Lorenz (1905) curve-based approach) seems to be the most common definition, although Gini himself also proposed a set-based pair-comparison formula for discrete, finite data. Both definitions are examined in this paper. There are other approaches (eg see Yitzhaki, 1998) to deriving the Gini index, but these are not pursued in the paper. (For the Lorenz curve-based definition that we used, see the appendix for a graphical illustration of the approach.)

##### Conclusion

Stats NZ's calculation of the Gini index using the algorithm current at 1 March 2016 should be discontinued, as there is a more precise method that agrees with theoretical definitions and usage by other government departments. Preferred SAS code and R programs have been identified, are included within this paper, and should replace the existing codes.

#### Key words

Gini index; Lorenz curve; Cumulative distribution function (CDF)

To read the paper, download or print the PDF from the 'Available files' box. If you have problems viewing the file, see Opening files and PDFs.

#### Citation

Stats NZ (2017). Accurate calculation of a Gini index using SAS and R. Retrieved from www.stats.govt.nz.

ISBN 978-1-98-852809-0 (online)
ISSN 1179-934X (online)

Published 16 June 2017

Top