Chapter 6 Dealing with non-linear indicators: the linearisation technique

So far, the statistics we’ve been dealing with are linear statistics that is, population totals, means or proportions. However, many indicators, particularly those used in the field of social statistics, are non-linear ones. For example, a mean or a proportion are non-linear when the denominator is unknown and therefore must be regarded as ratios between two linear indicators. Furthermore, a reference indicator of income inequalities is the Gini coefficient, whose definition uses rank statistics. Distributional aspects can also be measured through calculating percentiles such as the median, quartiles, quintiles or deciles. All those indicators are complex ones, for which variance calculation requires specific techniques.

6.1 The linearisation technique

Let assume an indicator θ be expressed as a function of the p totals Y1,Y2Yp:

θ=f(Y1,Y2Yp)

where Yi is the total of variable (yik) over U: Yi=kUyik

For example, an unemployment rate can be regarded as a ratio between the total number of unemployed persons in the labour force population Y=iU1UNEMPi and the total number of individuals in the labour force X=iU1LFi

RUNEMP=iU1UNEMPiiU1LFi=YX=f(Y,X)

A complex parameter such as (6.1) is traditionally estimated through substituting an estimator ˆYk for each of the p totals Y1,Y2Yp

ˆθ=f(ˆY1,ˆY2ˆYp)

Thus, the unemployment rate can be estimated by taking the ratio between the Horvitz-Thompson estimators for the numerator and the denominator:

ˆRUNEMP=f(ˆY,ˆX)=is1UNEMPiπiis1LFiπi

Assuming the function f is “regular” (C1 type - derivable with continuous derived function), the linearisation technique consists of approaching the complex estimator (6.3) with a linear estimator through first-order Taylor expansion:

ˆθ=f(ˆY1,ˆY2ˆYp)=f(Y1,Y2Yp)+pi=1fvi(Y1,Y2Yp)×(ˆYiYi)+Kn=f(Y1,Y2Yp)+pi=1di×(ˆYiYi)+Kn=C+pi=1diˆYi+Kn

where Kn is a random variable satisfying: Kn=OP(1n)

Finally, based on this first-order expansion, one can prove that the variance of the complex estimator ˆθ is equal to the variance of the linear part pi=1diˆYi plus a reminder term of order 1n3/2

V(ˆθ)=V(pi=1diˆYi)+O(1n3/2)

Thus, provided the sample size is “large” enough, the variance of ˆθ is asymptotically equal to that of its linear part:

V(ˆθ)V(pi=1diˆYi)=V(ˆZ)

where ˆZ is a (linear) estimator of the total Z of zk=pi=1diyik

As the partial derivatives di=fvi(Y1,Y2Yp) are unknown, the variance of ˆθ is estimated by:

ˆVL(ˆθ)=ˆV(pi=1˜diˆYi)=ˆV(ˆ˜Z)

where ˜di=fvi(ˆY1,ˆY2ˆYp) and ˜zk=pi=1˜diyik

6.2 Examples

  • Case of a ratio between two totals

ˆR=f(ˆY,ˆX)=ˆYˆX

The linearised variable is given by: ˜zk=fy(ˆY,ˆX)yk+fx(ˆY,ˆX)xk=1ˆXykˆYˆX2xk=1ˆX(ykˆRxk)

Then, assuming simple random sampling, the estimator of the variance of ˆR is given by: ˆVL(ˆR)=N2(1nN)s2zn

  • Case of the dispersion of a variable

S2=1NiU(yiˉY)2=1NiUy2i(iUyi)2N2=f(N,iUyi,iUy2i)

where f(x,y,z)=zxy2x2

Thus, we have

d1=fx(N,iUyi,iUy2i)=iUy2iN2+2(iUyi)2N3=1N(iUy2iN2(iUyi)2N2)=1N(S2ˉY2)

d2=fy(N,iUyi,iUy2i)=2iUyiN2=2ˉYN

d3=fz(N,iUyi,iUy2i)=1N

Therefore the linearised variable for the dispersion S2 of y is: zk=d1+d2yk+d3y2k=1N(S2ˉY2+2ˉYyky2k)=1N[(ykˉY)2S2]