reverse-shooting

Matlab scripts for reverse shooting
Log | Files | Refs | README

gini.m (5654B)


      1 % GINI computes the Gini coefficient and the Lorentz curve.
      2 %
      3 % Usage:
      4 %   g = gini(pop,val)
      5 %   [g,l] = gini(pop,val)
      6 %   [g,l,a] = gini(pop,val)
      7 %   ... = gini(pop,val,makeplot)
      8 %
      9 % Input and Output:
     10 %   pop     A vector of population sizes of the different classes.
     11 %   val     A vector of the measurement variable (e.g. income per capita)
     12 %           in the diffrerent classes.
     13 %   g       Gini coefficient.
     14 %   l       Lorentz curve: This is a two-column array, with the left
     15 %           column representing cumulative population shares of the
     16 %           different classes, sorted according to val, and the right
     17 %           column representing the cumulative value share that belongs to
     18 %           the population up to the given class. The Lorentz curve is a
     19 %           scatter plot of the left vs the right column.
     20 %   a       Same as l, except that the components are not normalized to
     21 %           range in the unit interval. Thus, the left column of a is the
     22 %           absolute cumulative population sizes of the classes, and the
     23 %           right colun is the absolute cumulative value of all classes up
     24 %           to the given one.
     25 %   makeplot  is a boolean, indicating whether a figure of the Lorentz
     26 %           curve should be produced or not. Default is false.
     27 %
     28 % Example:
     29 %   x = rand(100,1);
     30 %   y = rand(100,1);
     31 %   gini(x,y,true);             % random populations with random incomes
     32 %   figure;
     33 %   gini(x,ones(100,1),true);   % perfect equality
     34 %
     35 % Explanation:
     36 %
     37 %   The vectors pop and val must be equally long and must contain only
     38 %   positive values (zeros are also acceptable). A typical application
     39 %   would be that pop represents population sizes of some subgroups (e.g.
     40 %   different countries or states), and val represents the income per
     41 %   capita in this different subgroups. The Gini coefficient is a measure
     42 %   of how unequally income is distributed between these classes. A
     43 %   coefficient of zero means that all subgroups have exactly the same
     44 %   income per capital, so there is no dispesion of income; A very large
     45 %   coefficient would result if all the income accrues only to one subgroup
     46 %   and all the remaining groups have zero income. In the limit, when the
     47 %   total population size approaches infinity, but all the income accrues
     48 %   only to one individual, the Gini coefficient approaches unity.
     49 %
     50 %   The Lorenz curve is a graphical representation of the distribution. If
     51 %   (x,y) is a point on the Lorenz curve, then the poorest x-share of the
     52 %   population has the y-share of total income. By definition, (0,0) and
     53 %   (1,1) are points on the Lorentz curve (the poorest 0% have 0% of total
     54 %   income, and the poorest 100% [ie, everyone] have 100% of total income).
     55 %   Equal distribution implies that the Lorentz curve is the 45 degree
     56 %   line. Any inequality manifests itself as deviation of the Lorentz curve
     57 %   from the  45 degree line. By construction, the Lorenz curve is weakly
     58 %   convex and increasing.
     59 %
     60 %   The two concepts are related as follows: The Gini coefficient is twice
     61 %   the area between the 45 degree line and the Lorentz curve.
     62 %
     63 % Author : Yvan Lengwiler
     64 % Release: $1.0$
     65 % Date   : $2010-06-27$
     66 
     67 function [g,l,a] = gini(pop,val,makeplot)
     68 
     69     % check arguments
     70 
     71     assert(nargin >= 2, 'gini expects at least two arguments.')
     72 
     73     if nargin < 3
     74         makeplot = false;
     75     end
     76     assert(numel(pop) == numel(val), ...
     77         'gini expects two equally long vectors (%d ~= %d).', ...
     78         size(pop,1),size(val,1))
     79 
     80     pop = [0;pop(:)]; val = [0;val(:)];     % pre-append a zero
     81 
     82     isok = all(~isnan([pop,val]'))';        % filter out NaNs
     83     if sum(isok) < 2
     84         warning('gini:lacking_data','not enough data');
     85         g = NaN; l = NaN(1,4);
     86         return;
     87     end
     88     pop = pop(isok); val = val(isok);
     89     
     90     assert(all(pop>=0) && all(val>=0), ...
     91         'gini expects nonnegative vectors (neg elements in pop = %d, in val = %d).', ...
     92         sum(pop<0),sum(val<0))
     93     
     94     % process input
     95     z = val .* pop;
     96     [~,ord] = sort(val);
     97     pop    = pop(ord);     z    = z(ord);
     98     pop    = cumsum(pop);  z    = cumsum(z);
     99     relpop = pop/pop(end); relz = z/z(end);
    100     
    101     % Gini coefficient
    102 
    103     % We compute the area below the Lorentz curve. We do this by
    104     % computing the average of the left and right Riemann-like sums.
    105     % (I say Riemann-'like' because we evaluate not on a uniform grid, but
    106     % on the points given by the pop data).
    107     %
    108     % These are the two Rieman-like sums:
    109     %    leftsum  = sum(relz(1:end-1) .* diff(relpop));
    110     %    rightsum = sum(relz(2:end)   .* diff(relpop));
    111     % The Gini coefficient is one minus twice the average of leftsum and
    112     % rightsum. We can put all of this into one line.
    113     g = 1 - sum((relz(1:end-1)+relz(2:end)) .* diff(relpop));
    114     
    115     % Lorentz curve
    116     l = [relpop,relz];
    117     a = [pop,z];
    118     if makeplot   % ... plot it?
    119         area(relpop,relz,'FaceColor',[0.5,0.5,1.0]);    % the Lorentz curve
    120         hold on
    121         plot([0,1],[0,1],'--k');                        % 45 degree line
    122         axis tight      % ranges of abscissa and ordinate are by definition exactly [0,1]
    123         axis square     % both axes should be equally long
    124         set(gca,'XTick',get(gca,'YTick'))   % ensure equal ticking
    125         set(gca,'Layer','top');             % grid above the shaded area
    126         grid on;
    127         title(['\bfGini coefficient = ',num2str(g)]);
    128         xlabel('share of population');
    129         ylabel('share of value');
    130     end
    131     
    132 end