Bottom incomes and the measurement of poverty and inequality

In a nutshell

Understanding the bottom tail of an income distribution is important since it includes the income and wealth poor, the groups most in need of assistance and the primary target of social protection policies.

The poverty identification and inequality measurement problems posed by negative and especially zero incomes reported in household surveys are not trivial: they deserve attention and careful modelling by academics and practitioners.

Since uprisings in MENA have been linked to poverty and unequal economic opportunities, a better understanding of the scale of these problems can give policy-makers the tools to reduce social discontent and even fix some obstacles to growth.

Incomes in surveys suffer from various measurement problems, most notably in the lower and upper tails. This is known to bias the measurement of inequality, an issue that has generated a significant body of research covering high and low-income countries alike.

Statistical agencies and researchers working on poverty or inequality tend to ‘bottom-code’ or censor incomes at some low values such as zero, but this approach produces its own statistical problems. Some scholars have acknowledged the shortcomings with uncorrected statistics or those based on naïve corrections, and have studied the sensitivity of inequality indices to changes in bottom values. Some have proposed the use of parametric modelling similar to what has become accepted for top incomes.

Understanding the bottom tail of an income distribution is important from a policy perspective, arguably even more important than understanding the top tail of the distribution. The bottom tail includes the income and wealth poor, the groups most in need of assistance and the primary target of social protection policies.

Miscounting the poor affects the measurement of poverty and inequality, and contributes to bias in targeting exercises such as proxy means testing. This results in larger inclusion and exclusion errors, which have direct negative consequences for the livelihoods of the poor and vulnerable.

The presence of negative incomes may contribute to mismeasuring households’ wellbeing, as well as poverty and inequality. Evidence from comparing the distribution of self-employment income in survey and tax data in Latin America suggests that this income tends to be under-reported in surveys across all distribution quantiles.

Both tax evasion and poor recall can be at play. Hence, negative incomes may come from under-reporting of self-employment or capital incomes, implying that some of the households reporting negative incomes may be non-poor or even wealthy households.

Even when negative or zero incomes are accurate, including them in the distribution of incomes can be problematic for the purpose of distributional analysis, because these values may not reflect the households’ short-term or long-term capabilities, consumption or welfare.

Households’ overconsumption is a case in point. Accruing debt may be a survival strategy for the poor, an investment strategy for the middle class or a tax evasion strategy for the rich.

We may expect small negative incomes to be prevalent among chronically poor people who are temporarily in trouble, and large negative values among chronically rich people under-reporting, or writing off capital losses from past years. Large tax adjustments and social contribution withholdings may not reflect what was due on current income. Private transfers may not mean net outflows of resources or welfare.

These are not theoretical squabbles as negative incomes are quite prevalent in household surveys. For example, in the sample of 354 household surveys in the Luxembourg Income Study (LIS) database, 229 surveys contain negative disposable household incomes. In 12 surveys, negative incomes account for over 1% of non-zero incomes, and number up to 584 observations in a national survey.

Moreover, these negative incomes are not trivial in size. Mean negative income is as large in absolute value as 754% of mean nationwide positive income, and exceeds 200% of mean nationwide income in 15 surveys.

Zero incomes are also prevalent in household surveys. Among the 354 LIS surveys, 270 surveys contain zero incomes. In 22 surveys, zero incomes account for over 1% of non-negative incomes, and number up to 1,213 households. The inclusion of these incomes in poverty and inequality measures presents its own challenges.

In sum, understanding who is who between negative and zero incomes is essential for generating a consistent ordering among households, and measuring poverty and inequality correctly.

In our new study (Hlasny et al, 2020), we investigate the prevalence and consequences of non-positive incomes using 57 harmonised surveys covering 12 Mediterranean countries over the period 1995-2016. We find that the main source of negative disposable incomes in three-quarters of all surveys is negative self-employment income. High tax and social security withholdings and high self-paid social security contributions also account for negative incomes in some countries.

Interestingly, when surveys are sorted by the frequency of negative disposable household incomes, negative self-employment income shows up as the top source of their prevalence. When surveys are sorted by the relative magnitude of negative incomes, high inter-household transfers and undue social security and other burdens dominate as sources of the high relative magnitude of negative incomes.

The prevalence of negative incomes is thus primarily due to negative self-employment incomes, while the extreme values of negative incomes are typically due to extremely high social security contributions, non-income taxes and paid remittances.

We also find that households with negative incomes are typically as well off as (or even better off than) other households in terms of material wellbeing (see Figure 1). They appear to be healthier and as highly educated. By contrast, zero-income households are materially deprived, even though their human capital (it terms of health and educational attainment) is not clearly lower than that of their compatriots.

Adjusting poverty and inequality measures for these issues can alter these measures significantly. Applying standard statistical adjustments for negative and zero incomes – data trimming or bottom-coding – we obtain corrections of up to 2.3 percentage points of the Gini coefficient, and 1.5 points of the poverty headcount ratio.

But even these approaches have limitations. They do not use all information available in surveys, they do not replace unreliable zero or negative incomes with more realistic values, and they may produce truncated income distributions with discontinuous point-mass at zero.

One alternative approach proposed in previous research to address the question of truncation has been modelling non-positive incomes using a smooth parametric distribution function such as the Pareto function (Van Kerm, 2007). We test this approach but find that Pareto distributions do not fit the observed negative incomes well, sometimes predicting unrealistically large income dispersion among negatives.

A further alternative is to replace non-positive incomes using the best-matching positives by means of classic matching methods or machine learning algorithms. We test this approach using a random forest classifier based on households’ characteristics and find it to be more viable than the previous methods as a correction approach.

It produces a continuous distribution of overall incomes without a point-density at zero, and converts non-positive incomes into realistic positive values based on households’ observed characteristics. This imputation shows sensible results across multiple countries and across several model specifications that we try. It also lowers the estimated Gini by up to 1.7 points, and the estimated poverty headcount ratio (incidentally) also by up to 1.7 points.

These preliminary estimations, conducted under rather conservative assumptions and modelling specifications, suggest that the poverty identification and inequality measurement problems posed by negative and especially zero incomes are not trivial, and deserve attention and careful modelling by academics and practitioners.

In relation to the ‘static’ problem of non-positive incomes, our corrections produce more accurate inequality and poverty indexes for the majority of countries. But in relation to the ‘dynamic’ problem of non-positive incomes for measuring the evolution of inequality and poverty over time, we find only limited evidence that our corrections reduce the volatility of inequality and poverty indexes across survey waves, as would be desired from correction methods.

Where do we go from here? Moving beyond truncation or Pareto estimations and adopting more modern techniques such as matching and machine learning classifiers, should allow us to model negative as well as zero incomes more sensibly. Imputation using random forest classification shows some promise in this respect.

More importantly, extending the analysis to a greater range of bottom incomes – say the extreme 5-10% as research on top incomes has been doing, or all incomes falling short of households’ consumption – promises to yield more determinate corrections. We should find more clearly that the corrections provide a dynamic benefit in the form of reduced volatility of inequality and poverty indexes over time. With the corrected bottom incomes, we should be able to re-evaluate their impact on multidimensional deprivation and poverty.

The policy implications of this research in progress are clear. Our results are relevant for fiscal redistribution, aid targeting and – in the Middle East and North Africa (MENA) – the use of revenues from natural resources. Since uprisings in the MENA region have been linked to the problems of poverty and unequal economic opportunities, a better understanding of the scale of these problems can give policy-makers the tools to bring social discontent down, and even fix some traps and obstacles to economic growth.

Figure 1: Socio-economic characteristics of households with non-positive disposable income

Source: Authors’ calculations based on LIS-ERF database surveys.

The Forum ERF

ERF