README.md (12132B)
1 <!-- 2 3 @license Apache-2.0 4 5 Copyright (c) 2020 The Stdlib Authors. 6 7 Licensed under the Apache License, Version 2.0 (the "License"); 8 you may not use this file except in compliance with the License. 9 You may obtain a copy of the License at 10 11 http://www.apache.org/licenses/LICENSE-2.0 12 13 Unless required by applicable law or agreed to in writing, software 14 distributed under the License is distributed on an "AS IS" BASIS, 15 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 16 See the License for the specific language governing permissions and 17 limitations under the License. 18 19 --> 20 21 # svariancetk 22 23 > Calculate the [variance][variance] of a single-precision floating-point strided array using a one-pass textbook algorithm. 24 25 <section class="intro"> 26 27 The population [variance][variance] of a finite size population of size `N` is given by 28 29 <!-- <equation class="equation" label="eq:population_variance" align="center" raw="\sigma^2 = \frac{1}{N} \sum_{i=0}^{N-1} (x_i - \mu)^2" alt="Equation for the population variance."> --> 30 31 <div class="equation" align="center" data-raw-text="\sigma^2 = \frac{1}{N} \sum_{i=0}^{N-1} (x_i - \mu)^2" data-equation="eq:population_variance"> 32 <img src="https://cdn.jsdelivr.net/gh/stdlib-js/stdlib@6da3e7388e483798f23a9ce30fcb35f454e7e3b4/lib/node_modules/@stdlib/stats/base/svariancetk/docs/img/equation_population_variance.svg" alt="Equation for the population variance."> 33 <br> 34 </div> 35 36 <!-- </equation> --> 37 38 where the population mean is given by 39 40 <!-- <equation class="equation" label="eq:population_mean" align="center" raw="\mu = \frac{1}{N} \sum_{i=0}^{N-1} x_i" alt="Equation for the population mean."> --> 41 42 <div class="equation" align="center" data-raw-text="\mu = \frac{1}{N} \sum_{i=0}^{N-1} x_i" data-equation="eq:population_mean"> 43 <img src="https://cdn.jsdelivr.net/gh/stdlib-js/stdlib@6da3e7388e483798f23a9ce30fcb35f454e7e3b4/lib/node_modules/@stdlib/stats/base/svariancetk/docs/img/equation_population_mean.svg" alt="Equation for the population mean."> 44 <br> 45 </div> 46 47 <!-- </equation> --> 48 49 After rearranging terms, the population [variance][variance] can be equivalently expressed as 50 51 <!-- <equation class="equation" label="eq:population_variance_textbook" align="center" raw="\sigma^2 = \frac{1}{N}\biggl(\ \sum_{i=0}^{N-1} x_i^2 - \frac{1}{N}\biggl(\ \sum_{i=0}^{N-1} x_i \ \biggr)^2\ \biggr)" alt="Equation for the population variance (one-pass textbook formula)."> --> 52 53 <div class="equation" align="center" data-raw-text="\sigma^2 = \frac{1}{N}\biggl(\ \sum_{i=0}^{N-1} x_i^2 - \frac{1}{N}\biggl(\ \sum_{i=0}^{N-1} x_i \ \biggr)^2\ \biggr)" data-equation="eq:population_variance_textbook"> 54 <img src="https://cdn.jsdelivr.net/gh/stdlib-js/stdlib@6da3e7388e483798f23a9ce30fcb35f454e7e3b4/lib/node_modules/@stdlib/stats/base/svariancetk/docs/img/equation_population_variance_textbook.svg" alt="Equation for the population variance (one-pass textbook formula)."> 55 <br> 56 </div> 57 58 <!-- </equation> --> 59 60 Often in the analysis of data, the true population [variance][variance] is not known _a priori_ and must be estimated from a sample drawn from the population distribution. If one attempts to use the formula for the population [variance][variance], the result is biased and yields a **biased sample variance**. To compute an **unbiased sample variance** for a sample of size `n`, 61 62 <!-- <equation class="equation" label="eq:unbiased_sample_variance" align="center" raw="s^2 = \frac{1}{n-1} \sum_{i=0}^{n-1} (x_i - \bar{x})^2" alt="Equation for computing an unbiased sample variance."> --> 63 64 <div class="equation" align="center" data-raw-text="s^2 = \frac{1}{n-1} \sum_{i=0}^{n-1} (x_i - \bar{x})^2" data-equation="eq:unbiased_sample_variance"> 65 <img src="https://cdn.jsdelivr.net/gh/stdlib-js/stdlib@6da3e7388e483798f23a9ce30fcb35f454e7e3b4/lib/node_modules/@stdlib/stats/base/svariancetk/docs/img/equation_unbiased_sample_variance.svg" alt="Equation for computing an unbiased sample variance."> 66 <br> 67 </div> 68 69 <!-- </equation> --> 70 71 where the sample mean is given by 72 73 <!-- <equation class="equation" label="eq:sample_mean" align="center" raw="\bar{x} = \frac{1}{n} \sum_{i=0}^{n-1} x_i" alt="Equation for the sample mean."> --> 74 75 <div class="equation" align="center" data-raw-text="\bar{x} = \frac{1}{n} \sum_{i=0}^{n-1} x_i" data-equation="eq:sample_mean"> 76 <img src="https://cdn.jsdelivr.net/gh/stdlib-js/stdlib@6da3e7388e483798f23a9ce30fcb35f454e7e3b4/lib/node_modules/@stdlib/stats/base/svariancetk/docs/img/equation_sample_mean.svg" alt="Equation for the sample mean."> 77 <br> 78 </div> 79 80 <!-- </equation> --> 81 82 Similar to the population [variance][variance], after rearranging terms, the **unbiased sample variance** can be equivalently expressed as 83 84 <!-- <equation class="equation" label="eq:unbiased_sample_variance_textbook" align="center" raw="s^2 = \frac{1}{n-1}\biggl(\ \sum_{i=0}^{n-1} x_i^2 - \frac{1}{n}\biggl(\ \sum_{i=0}^{n-1} x_i \ \biggr)^2\ \biggr)" alt="Equation for the unbiased sample variance (one-pass textbook formula)."> --> 85 86 <div class="equation" align="center" data-raw-text="s^2 = \frac{1}{n-1}\biggl(\ \sum_{i=0}^{n-1} x_i^2 - \frac{1}{n}\biggl(\ \sum_{i=0}^{n-1} x_i \ \biggr)^2\ \biggr)" data-equation="eq:unbiased_sample_variance_textbook"> 87 <img src="https://cdn.jsdelivr.net/gh/stdlib-js/stdlib@6da3e7388e483798f23a9ce30fcb35f454e7e3b4/lib/node_modules/@stdlib/stats/base/svariancetk/docs/img/equation_unbiased_sample_variance_textbook.svg" alt="Equation for the unbiased sample variance (one-pass textbook formula)."> 88 <br> 89 </div> 90 91 <!-- </equation> --> 92 93 The use of the term `n-1` is commonly referred to as Bessel's correction. Note, however, that applying Bessel's correction can increase the mean squared error between the sample variance and population variance. Depending on the characteristics of the population distribution, other correction factors (e.g., `n-1.5`, `n+1`, etc) can yield better estimators. 94 95 </section> 96 97 <!-- /.intro --> 98 99 <section class="usage"> 100 101 ## Usage 102 103 ```javascript 104 var svariancetk = require( '@stdlib/stats/base/svariancetk' ); 105 ``` 106 107 #### svariancetk( N, correction, x, stride ) 108 109 Computes the [variance][variance] of a single-precision floating-point strided array `x` using a one-pass textbook algorithm. 110 111 ```javascript 112 var Float32Array = require( '@stdlib/array/float32' ); 113 114 var x = new Float32Array( [ 1.0, -2.0, 2.0 ] ); 115 var N = x.length; 116 117 var v = svariancetk( N, 1, x, 1 ); 118 // returns ~4.3333 119 ``` 120 121 The function has the following parameters: 122 123 - **N**: number of indexed elements. 124 - **correction**: degrees of freedom adjustment. Setting this parameter to a value other than `0` has the effect of adjusting the divisor during the calculation of the [variance][variance] according to `N-c` where `c` corresponds to the provided degrees of freedom adjustment. When computing the [variance][variance] of a population, setting this parameter to `0` is the standard choice (i.e., the provided array contains data constituting an entire population). When computing the unbiased sample [variance][variance], setting this parameter to `1` is the standard choice (i.e., the provided array contains data sampled from a larger population; this is commonly referred to as Bessel's correction). 125 - **x**: input [`Float32Array`][@stdlib/array/float32]. 126 - **stride**: index increment for `x`. 127 128 The `N` and `stride` parameters determine which elements in `x` are accessed at runtime. For example, to compute the [variance][variance] of every other element in `x`, 129 130 ```javascript 131 var Float32Array = require( '@stdlib/array/float32' ); 132 var floor = require( '@stdlib/math/base/special/floor' ); 133 134 var x = new Float32Array( [ 1.0, 2.0, 2.0, -7.0, -2.0, 3.0, 4.0, 2.0 ] ); 135 var N = floor( x.length / 2 ); 136 137 var v = svariancetk( N, 1, x, 2 ); 138 // returns 6.25 139 ``` 140 141 Note that indexing is relative to the first index. To introduce an offset, use [`typed array`][mdn-typed-array] views. 142 143 <!-- eslint-disable stdlib/capitalized-comments --> 144 145 ```javascript 146 var Float32Array = require( '@stdlib/array/float32' ); 147 var floor = require( '@stdlib/math/base/special/floor' ); 148 149 var x0 = new Float32Array( [ 2.0, 1.0, 2.0, -2.0, -2.0, 2.0, 3.0, 4.0 ] ); 150 var x1 = new Float32Array( x0.buffer, x0.BYTES_PER_ELEMENT*1 ); // start at 2nd element 151 152 var N = floor( x0.length / 2 ); 153 154 var v = svariancetk( N, 1, x1, 2 ); 155 // returns 6.25 156 ``` 157 158 #### svariancetk.ndarray( N, correction, x, stride, offset ) 159 160 Computes the [variance][variance] of a single-precision floating-point strided array using a one-pass textbook algorithm and alternative indexing semantics. 161 162 ```javascript 163 var Float32Array = require( '@stdlib/array/float32' ); 164 165 var x = new Float32Array( [ 1.0, -2.0, 2.0 ] ); 166 var N = x.length; 167 168 var v = svariancetk.ndarray( N, 1, x, 1, 0 ); 169 // returns ~4.33333 170 ``` 171 172 The function has the following additional parameters: 173 174 - **offset**: starting index for `x`. 175 176 While [`typed array`][mdn-typed-array] views mandate a view offset based on the underlying `buffer`, the `offset` parameter supports indexing semantics based on a starting index. For example, to calculate the [variance][variance] for every other value in `x` starting from the second value 177 178 ```javascript 179 var Float32Array = require( '@stdlib/array/float32' ); 180 var floor = require( '@stdlib/math/base/special/floor' ); 181 182 var x = new Float32Array( [ 2.0, 1.0, 2.0, -2.0, -2.0, 2.0, 3.0, 4.0 ] ); 183 var N = floor( x.length / 2 ); 184 185 var v = svariancetk.ndarray( N, 1, x, 2, 1 ); 186 // returns 6.25 187 ``` 188 189 </section> 190 191 <!-- /.usage --> 192 193 <section class="notes"> 194 195 ## Notes 196 197 - If `N <= 0`, both functions return `NaN`. 198 - If `N - c` is less than or equal to `0` (where `c` corresponds to the provided degrees of freedom adjustment), both functions return `NaN`. 199 - Some caution should be exercised when using the one-pass textbook algorithm. Literature overwhelmingly discourages the algorithm's use for two reasons: 1) the lack of safeguards against underflow and overflow and 2) the risk of catastrophic cancellation when subtracting the two sums if the sums are large and the variance small. These concerns have merit; however, the one-pass textbook algorithm should not be dismissed outright. For data distributions with a moderately large standard deviation to mean ratio (i.e., **coefficient of variation**), the one-pass textbook algorithm may be acceptable, especially when performance is paramount and some precision loss is acceptable (including a risk of returning a negative variance due to floating-point rounding errors!). In short, no single "best" algorithm for computing the variance exists. The "best" algorithm depends on the underlying data distribution, your performance requirements, and your minimum precision requirements. When evaluating which algorithm to use, consider the relative pros and cons, and choose the algorithm which best serves your needs. 200 201 </section> 202 203 <!-- /.notes --> 204 205 <section class="examples"> 206 207 ## Examples 208 209 <!-- eslint no-undef: "error" --> 210 211 ```javascript 212 var randu = require( '@stdlib/random/base/randu' ); 213 var round = require( '@stdlib/math/base/special/round' ); 214 var Float32Array = require( '@stdlib/array/float32' ); 215 var svariancetk = require( '@stdlib/stats/base/svariancetk' ); 216 217 var x; 218 var i; 219 220 x = new Float32Array( 10 ); 221 for ( i = 0; i < x.length; i++ ) { 222 x[ i ] = round( (randu()*100.0) - 50.0 ); 223 } 224 console.log( x ); 225 226 var v = svariancetk( x.length, 1, x, 1 ); 227 console.log( v ); 228 ``` 229 230 </section> 231 232 <!-- /.examples --> 233 234 * * * 235 236 <section class="references"> 237 238 ## References 239 240 - Ling, Robert F. 1974. "Comparison of Several Algorithms for Computing Sample Means and Variances." _Journal of the American Statistical Association_ 69 (348). American Statistical Association, Taylor & Francis, Ltd.: 859–66. doi:[10.2307/2286154][@ling:1974a]. 241 242 </section> 243 244 <!-- /.references --> 245 246 <section class="links"> 247 248 [variance]: https://en.wikipedia.org/wiki/Variance 249 250 [@stdlib/array/float32]: https://www.npmjs.com/package/@stdlib/array-float32 251 252 [mdn-typed-array]: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/TypedArray 253 254 [@ling:1974a]: https://doi.org/10.2307/2286154 255 256 </section> 257 258 <!-- /.links -->