ksr.m (2718B)
1 function r=ksr(x,y,h,N) 2 % KSR Kernel smoothing regression 3 % 4 % r=ksr(x,y) returns the Gaussian kernel regression in structure r such that 5 % r.f(r.x) = y(x) + e 6 % The bandwidth and number of samples are also stored in r.h and r.n 7 % respectively. 8 % 9 % r=ksr(x,y,h) performs the regression using the specified bandwidth, h. 10 % 11 % r=ksr(x,y,h,n) calculates the regression in n points (default n=100). 12 % 13 % Without output, ksr(x,y) or ksr(x,y,h) will display the regression plot. 14 % 15 % Algorithm 16 % The kernel regression is a non-parametric approach to estimate the 17 % conditional expectation of a random variable: 18 % 19 % E(Y|X) = f(X) 20 % 21 % where f is a non-parametric function. Based on the kernel density 22 % estimation, this code implements the Nadaraya-Watson kernel regression 23 % using the Gaussian kernel as follows: 24 % 25 % f(x) = sum(kerf((x-X)/h).*Y)/sum(kerf((x-X)/h)) 26 % 27 % See also gkde, ksdensity 28 29 % Example 1: smooth curve with noise 30 %{ 31 x = 1:100; 32 y = sin(x/10)+(x/50).^2; 33 yn = y + 0.2*randn(1,100); 34 r=ksr(x,yn); 35 plot(x,y,'b-',x,yn,'co',r.x,r.f,'r--','linewidth',2) 36 legend('true','data','regression','location','northwest'); 37 title('Gaussian kernel regression') 38 %} 39 % Example 2: with missing data 40 %{ 41 x = sort(rand(1,100)*99)+1; 42 y = sin(x/10)+(x/50).^2; 43 y(round(rand(1,20)*100)) = NaN; 44 yn = y + 0.2*randn(1,100); 45 r=ksr(x,yn); 46 plot(x,y,'b-',x,yn,'co',r.x,r.f,'r--','linewidth',2) 47 legend('true','data','regression','location','northwest'); 48 title('Gaussian kernel regression with 20% missing data') 49 %} 50 51 % By Yi Cao at Cranfield University on 12 March 2008. 52 % 53 54 % Check input and output 55 error(nargchk(2,4,nargin)); 56 error(nargoutchk(0,1,nargout)); 57 if numel(x)~=numel(y) 58 error('x and y are in different sizes.'); 59 end 60 61 x=x(:); 62 y=y(:); 63 % clean missing or invalid data points 64 inv=(x~=x)|(y~=y); 65 x(inv)=[]; 66 y(inv)=[]; 67 68 % Default parameters 69 if nargin<4 70 N=100; 71 elseif ~isscalar(N) 72 error('N must be a scalar.') 73 end 74 r.n=length(x); 75 if nargin<3 76 % optimal bandwidth suggested by Bowman and Azzalini (1997) p.31 77 hx=median(abs(x-median(x)))/0.6745*(4/3/r.n)^0.2; 78 hy=median(abs(y-median(y)))/0.6745*(4/3/r.n)^0.2; 79 h=sqrt(hy*hx); 80 if h<sqrt(eps)*N 81 error('There is no enough variation in the data. Regression is meaningless.') 82 end 83 elseif ~isscalar(h) 84 error('h must be a scalar.') 85 end 86 r.h=h; 87 88 % Gaussian kernel function 89 kerf=@(z)exp(-z.*z/2)/sqrt(2*pi); 90 91 r.x=linspace(min(x),max(x),N); 92 r.f=zeros(1,N); 93 for k=1:N 94 z=kerf((r.x(k)-x)/h); 95 r.f(k)=sum(z.*y)/sum(z); 96 end 97 98 % Plot 99 if ~nargout 100 plot(r.x,r.f,'r',x,y,'bo') 101 ylabel('f(x)') 102 xlabel('x') 103 title('Kernel Smoothing Regression'); 104 end