README.md - time-to-botec - Benchmark sampling in different programming languages

README.md (9511B)
      1 <!--
      2 
      3 @license Apache-2.0
      4 
      5 Copyright (c) 2020 The Stdlib Authors.
      6 
      7 Licensed under the Apache License, Version 2.0 (the "License");
      8 you may not use this file except in compliance with the License.
      9 You may obtain a copy of the License at
     10 
     11    http://www.apache.org/licenses/LICENSE-2.0
     12 
     13 Unless required by applicable law or agreed to in writing, software
     14 distributed under the License is distributed on an "AS IS" BASIS,
     15 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
     16 See the License for the specific language governing permissions and
     17 limitations under the License.
     18 
     19 -->
     20 
     21 # variancepn
     22 
     23 > Calculate the [variance][variance] of a strided array using a two-pass algorithm.
     24 
     25 <section class="intro">
     26 
     27 The population [variance][variance] of a finite size population of size `N` is given by
     28 
     29 <!-- <equation class="equation" label="eq:population_variance" align="center" raw="\sigma^2 = \frac{1}{N} \sum_{i=0}^{N-1} (x_i - \mu)^2" alt="Equation for the population variance."> -->
     30 
     31 <div class="equation" align="center" data-raw-text="\sigma^2 = \frac{1}{N} \sum_{i=0}^{N-1} (x_i - \mu)^2" data-equation="eq:population_variance">
     32     <img src="https://cdn.jsdelivr.net/gh/stdlib-js/stdlib@b7aa38ad56dc6dc7e5327fce8074b1d9d61ebe11/lib/node_modules/@stdlib/stats/base/variancepn/docs/img/equation_population_variance.svg" alt="Equation for the population variance.">
     33     <br>
     34 </div>
     35 
     36 <!-- </equation> -->
     37 
     38 where the population mean is given by
     39 
     40 <!-- <equation class="equation" label="eq:population_mean" align="center" raw="\mu = \frac{1}{N} \sum_{i=0}^{N-1} x_i" alt="Equation for the population mean."> -->
     41 
     42 <div class="equation" align="center" data-raw-text="\mu = \frac{1}{N} \sum_{i=0}^{N-1} x_i" data-equation="eq:population_mean">
     43     <img src="https://cdn.jsdelivr.net/gh/stdlib-js/stdlib@b7aa38ad56dc6dc7e5327fce8074b1d9d61ebe11/lib/node_modules/@stdlib/stats/base/variancepn/docs/img/equation_population_mean.svg" alt="Equation for the population mean.">
     44     <br>
     45 </div>
     46 
     47 <!-- </equation> -->
     48 
     49 Often in the analysis of data, the true population [variance][variance] is not known _a priori_ and must be estimated from a sample drawn from the population distribution. If one attempts to use the formula for the population [variance][variance], the result is biased and yields a **biased sample variance**. To compute an **unbiased sample variance** for a sample of size `n`,
     50 
     51 <!-- <equation class="equation" label="eq:unbiased_sample_variance" align="center" raw="s^2 = \frac{1}{n-1} \sum_{i=0}^{n-1} (x_i - \bar{x})^2" alt="Equation for computing an unbiased sample variance."> -->
     52 
     53 <div class="equation" align="center" data-raw-text="s^2 = \frac{1}{n-1} \sum_{i=0}^{n-1} (x_i - \bar{x})^2" data-equation="eq:unbiased_sample_variance">
     54     <img src="https://cdn.jsdelivr.net/gh/stdlib-js/stdlib@b7aa38ad56dc6dc7e5327fce8074b1d9d61ebe11/lib/node_modules/@stdlib/stats/base/variancepn/docs/img/equation_unbiased_sample_variance.svg" alt="Equation for computing an unbiased sample variance.">
     55     <br>
     56 </div>
     57 
     58 <!-- </equation> -->
     59 
     60 where the sample mean is given by
     61 
     62 <!-- <equation class="equation" label="eq:sample_mean" align="center" raw="\bar{x} = \frac{1}{n} \sum_{i=0}^{n-1} x_i" alt="Equation for the sample mean."> -->
     63 
     64 <div class="equation" align="center" data-raw-text="\bar{x} = \frac{1}{n} \sum_{i=0}^{n-1} x_i" data-equation="eq:sample_mean">
     65     <img src="https://cdn.jsdelivr.net/gh/stdlib-js/stdlib@b7aa38ad56dc6dc7e5327fce8074b1d9d61ebe11/lib/node_modules/@stdlib/stats/base/variancepn/docs/img/equation_sample_mean.svg" alt="Equation for the sample mean.">
     66     <br>
     67 </div>
     68 
     69 <!-- </equation> -->
     70 
     71 The use of the term `n-1` is commonly referred to as Bessel's correction. Note, however, that applying Bessel's correction can increase the mean squared error between the sample variance and population variance. Depending on the characteristics of the population distribution, other correction factors (e.g., `n-1.5`, `n+1`, etc) can yield better estimators.
     72 
     73 </section>
     74 
     75 <!-- /.intro -->
     76 
     77 <section class="usage">
     78 
     79 ## Usage
     80 
     81 ```javascript
     82 var variancepn = require( '@stdlib/stats/base/variancepn' );
     83 ```
     84 
     85 #### variancepn( N, correction, x, stride )
     86 
     87 Computes the [variance][variance] of a strided array `x` using a two-pass algorithm.
     88 
     89 ```javascript
     90 var x = [ 1.0, -2.0, 2.0 ];
     91 var N = x.length;
     92 
     93 var v = variancepn( N, 1, x, 1 );
     94 // returns ~4.3333
     95 ```
     96 
     97 The function has the following parameters:
     98 
     99 -   **N**: number of indexed elements.
    100 -   **correction**: degrees of freedom adjustment. Setting this parameter to a value other than `0` has the effect of adjusting the divisor during the calculation of the [variance][variance] according to `N-c` where `c` corresponds to the provided degrees of freedom adjustment. When computing the [variance][variance] of a population, setting this parameter to `0` is the standard choice (i.e., the provided array contains data constituting an entire population). When computing the unbiased sample [variance][variance], setting this parameter to `1` is the standard choice (i.e., the provided array contains data sampled from a larger population; this is commonly referred to as Bessel's correction).
    101 -   **x**: input [`Array`][mdn-array] or [`typed array`][mdn-typed-array].
    102 -   **stride**: index increment for `x`.
    103 
    104 The `N` and `stride` parameters determine which elements in `x` are accessed at runtime. For example, to compute the [variance][variance] of every other element in `x`,
    105 
    106 ```javascript
    107 var floor = require( '@stdlib/math/base/special/floor' );
    108 
    109 var x = [ 1.0, 2.0, 2.0, -7.0, -2.0, 3.0, 4.0, 2.0 ];
    110 var N = floor( x.length / 2 );
    111 
    112 var v = variancepn( N, 1, x, 2 );
    113 // returns 6.25
    114 ```
    115 
    116 Note that indexing is relative to the first index. To introduce an offset, use [`typed array`][mdn-typed-array] views.
    117 
    118 <!-- eslint-disable stdlib/capitalized-comments -->
    119 
    120 ```javascript
    121 var Float64Array = require( '@stdlib/array/float64' );
    122 var floor = require( '@stdlib/math/base/special/floor' );
    123 
    124 var x0 = new Float64Array( [ 2.0, 1.0, 2.0, -2.0, -2.0, 2.0, 3.0, 4.0 ] );
    125 var x1 = new Float64Array( x0.buffer, x0.BYTES_PER_ELEMENT*1 ); // start at 2nd element
    126 
    127 var N = floor( x0.length / 2 );
    128 
    129 var v = variancepn( N, 1, x1, 2 );
    130 // returns 6.25
    131 ```
    132 
    133 #### variancepn.ndarray( N, correction, x, stride, offset )
    134 
    135 Computes the [variance][variance] of a strided array using a two-pass algorithm and alternative indexing semantics.
    136 
    137 ```javascript
    138 var x = [ 1.0, -2.0, 2.0 ];
    139 var N = x.length;
    140 
    141 var v = variancepn.ndarray( N, 1, x, 1, 0 );
    142 // returns ~4.33333
    143 ```
    144 
    145 The function has the following additional parameters:
    146 
    147 -   **offset**: starting index for `x`.
    148 
    149 While [`typed array`][mdn-typed-array] views mandate a view offset based on the underlying `buffer`, the `offset` parameter supports indexing semantics based on a starting index. For example, to calculate the [variance][variance] for every other value in `x` starting from the second value
    150 
    151 ```javascript
    152 var floor = require( '@stdlib/math/base/special/floor' );
    153 
    154 var x = [ 2.0, 1.0, 2.0, -2.0, -2.0, 2.0, 3.0, 4.0 ];
    155 var N = floor( x.length / 2 );
    156 
    157 var v = variancepn.ndarray( N, 1, x, 2, 1 );
    158 // returns 6.25
    159 ```
    160 
    161 </section>
    162 
    163 <!-- /.usage -->
    164 
    165 <section class="notes">
    166 
    167 ## Notes
    168 
    169 -   If `N <= 0`, both functions return `NaN`.
    170 -   If `N - c` is less than or equal to `0` (where `c` corresponds to the provided degrees of freedom adjustment), both functions return `NaN`.
    171 -   Depending on the environment, the typed versions ([`dvariancepn`][@stdlib/stats/base/dvariancepn], [`svariancepn`][@stdlib/stats/base/svariancepn], etc.) are likely to be significantly more performant.
    172 
    173 </section>
    174 
    175 <!-- /.notes -->
    176 
    177 <section class="examples">
    178 
    179 ## Examples
    180 
    181 <!-- eslint no-undef: "error" -->
    182 
    183 ```javascript
    184 var randu = require( '@stdlib/random/base/randu' );
    185 var round = require( '@stdlib/math/base/special/round' );
    186 var Float64Array = require( '@stdlib/array/float64' );
    187 var variancepn = require( '@stdlib/stats/base/variancepn' );
    188 
    189 var x;
    190 var i;
    191 
    192 x = new Float64Array( 10 );
    193 for ( i = 0; i < x.length; i++ ) {
    194     x[ i ] = round( (randu()*100.0) - 50.0 );
    195 }
    196 console.log( x );
    197 
    198 var v = variancepn( x.length, 1, x, 1 );
    199 console.log( v );
    200 ```
    201 
    202 </section>
    203 
    204 <!-- /.examples -->
    205 
    206 * * *
    207 
    208 <section class="references">
    209 
    210 ## References
    211 
    212 -   Neely, Peter M. 1966. "Comparison of Several Algorithms for Computation of Means, Standard Deviations and Correlation Coefficients." _Communications of the ACM_ 9 (7). Association for Computing Machinery: 496–99. doi:[10.1145/365719.365958][@neely:1966a].
    213 -   Schubert, Erich, and Michael Gertz. 2018. "Numerically Stable Parallel Computation of (Co-)Variance." In _Proceedings of the 30th International Conference on Scientific and Statistical Database Management_. New York, NY, USA: Association for Computing Machinery. doi:[10.1145/3221269.3223036][@schubert:2018a].
    214 
    215 </section>
    216 
    217 <!-- /.references -->
    218 
    219 <section class="links">
    220 
    221 [variance]: https://en.wikipedia.org/wiki/Variance
    222 
    223 [mdn-array]: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array
    224 
    225 [mdn-typed-array]: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/TypedArray
    226 
    227 [@stdlib/stats/base/dvariancepn]: https://www.npmjs.com/package/@stdlib/stats/tree/main/base/dvariancepn
    228 
    229 [@stdlib/stats/base/svariancepn]: https://www.npmjs.com/package/@stdlib/stats/tree/main/base/svariancepn
    230 
    231 [@neely:1966a]: https://doi.org/10.1145/365719.365958
    232 
    233 [@schubert:2018a]: https://doi.org/10.1145/3221269.3223036
    234 
    235 </section>
    236 
    237 <!-- /.links -->
	time-to-botec Benchmark sampling in different programming languages
	Log \| Files \| Refs \| README