gini.m (5654B)
1 % GINI computes the Gini coefficient and the Lorentz curve. 2 % 3 % Usage: 4 % g = gini(pop,val) 5 % [g,l] = gini(pop,val) 6 % [g,l,a] = gini(pop,val) 7 % ... = gini(pop,val,makeplot) 8 % 9 % Input and Output: 10 % pop A vector of population sizes of the different classes. 11 % val A vector of the measurement variable (e.g. income per capita) 12 % in the diffrerent classes. 13 % g Gini coefficient. 14 % l Lorentz curve: This is a two-column array, with the left 15 % column representing cumulative population shares of the 16 % different classes, sorted according to val, and the right 17 % column representing the cumulative value share that belongs to 18 % the population up to the given class. The Lorentz curve is a 19 % scatter plot of the left vs the right column. 20 % a Same as l, except that the components are not normalized to 21 % range in the unit interval. Thus, the left column of a is the 22 % absolute cumulative population sizes of the classes, and the 23 % right colun is the absolute cumulative value of all classes up 24 % to the given one. 25 % makeplot is a boolean, indicating whether a figure of the Lorentz 26 % curve should be produced or not. Default is false. 27 % 28 % Example: 29 % x = rand(100,1); 30 % y = rand(100,1); 31 % gini(x,y,true); % random populations with random incomes 32 % figure; 33 % gini(x,ones(100,1),true); % perfect equality 34 % 35 % Explanation: 36 % 37 % The vectors pop and val must be equally long and must contain only 38 % positive values (zeros are also acceptable). A typical application 39 % would be that pop represents population sizes of some subgroups (e.g. 40 % different countries or states), and val represents the income per 41 % capita in this different subgroups. The Gini coefficient is a measure 42 % of how unequally income is distributed between these classes. A 43 % coefficient of zero means that all subgroups have exactly the same 44 % income per capital, so there is no dispesion of income; A very large 45 % coefficient would result if all the income accrues only to one subgroup 46 % and all the remaining groups have zero income. In the limit, when the 47 % total population size approaches infinity, but all the income accrues 48 % only to one individual, the Gini coefficient approaches unity. 49 % 50 % The Lorenz curve is a graphical representation of the distribution. If 51 % (x,y) is a point on the Lorenz curve, then the poorest x-share of the 52 % population has the y-share of total income. By definition, (0,0) and 53 % (1,1) are points on the Lorentz curve (the poorest 0% have 0% of total 54 % income, and the poorest 100% [ie, everyone] have 100% of total income). 55 % Equal distribution implies that the Lorentz curve is the 45 degree 56 % line. Any inequality manifests itself as deviation of the Lorentz curve 57 % from the 45 degree line. By construction, the Lorenz curve is weakly 58 % convex and increasing. 59 % 60 % The two concepts are related as follows: The Gini coefficient is twice 61 % the area between the 45 degree line and the Lorentz curve. 62 % 63 % Author : Yvan Lengwiler 64 % Release: $1.0$ 65 % Date : $2010-06-27$ 66 67 function [g,l,a] = gini(pop,val,makeplot) 68 69 % check arguments 70 71 assert(nargin >= 2, 'gini expects at least two arguments.') 72 73 if nargin < 3 74 makeplot = false; 75 end 76 assert(numel(pop) == numel(val), ... 77 'gini expects two equally long vectors (%d ~= %d).', ... 78 size(pop,1),size(val,1)) 79 80 pop = [0;pop(:)]; val = [0;val(:)]; % pre-append a zero 81 82 isok = all(~isnan([pop,val]'))'; % filter out NaNs 83 if sum(isok) < 2 84 warning('gini:lacking_data','not enough data'); 85 g = NaN; l = NaN(1,4); 86 return; 87 end 88 pop = pop(isok); val = val(isok); 89 90 assert(all(pop>=0) && all(val>=0), ... 91 'gini expects nonnegative vectors (neg elements in pop = %d, in val = %d).', ... 92 sum(pop<0),sum(val<0)) 93 94 % process input 95 z = val .* pop; 96 [~,ord] = sort(val); 97 pop = pop(ord); z = z(ord); 98 pop = cumsum(pop); z = cumsum(z); 99 relpop = pop/pop(end); relz = z/z(end); 100 101 % Gini coefficient 102 103 % We compute the area below the Lorentz curve. We do this by 104 % computing the average of the left and right Riemann-like sums. 105 % (I say Riemann-'like' because we evaluate not on a uniform grid, but 106 % on the points given by the pop data). 107 % 108 % These are the two Rieman-like sums: 109 % leftsum = sum(relz(1:end-1) .* diff(relpop)); 110 % rightsum = sum(relz(2:end) .* diff(relpop)); 111 % The Gini coefficient is one minus twice the average of leftsum and 112 % rightsum. We can put all of this into one line. 113 g = 1 - sum((relz(1:end-1)+relz(2:end)) .* diff(relpop)); 114 115 % Lorentz curve 116 l = [relpop,relz]; 117 a = [pop,z]; 118 if makeplot % ... plot it? 119 area(relpop,relz,'FaceColor',[0.5,0.5,1.0]); % the Lorentz curve 120 hold on 121 plot([0,1],[0,1],'--k'); % 45 degree line 122 axis tight % ranges of abscissa and ordinate are by definition exactly [0,1] 123 axis square % both axes should be equally long 124 set(gca,'XTick',get(gca,'YTick')) % ensure equal ticking 125 set(gca,'Layer','top'); % grid above the shaded area 126 grid on; 127 title(['\bfGini coefficient = ',num2str(g)]); 128 xlabel('share of population'); 129 ylabel('share of value'); 130 end 131 132 end