index.html (40899B)
1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> 2 <html> 3 <head> 4 <link href="https://fonts.googleapis.com/css?family=Open+Sans:400,400i,700,700i" rel="stylesheet"> 5 <meta http-equiv="Content-Type" content="text/html;charset=utf-8"> 6 <meta name="keywords" content="rng, prng, xoshiro, xoroshiro, xorshift, pseudorandom number generator, random number generator"> 7 <style type="text/css"> 8 @import "css/content.php"; 9 @import "css/layout.php"; 10 @import "css/tablesorter.css"; 11 12 </style> 13 <title>xoshiro/xoroshiro generators and the PRNG shootout</title> 14 <script type="text/javascript" src="js/jquery.js"></script> 15 <script type="text/javascript" src="js/tablesorter.js"></script> 16 <script type="text/javascript" src="js/metadata.js"></script> 17 <script type="text/javascript"> 18 $.tablesorter.defaults.widgets = ['zebra']; 19 $(document).ready( function() { 20 $("#prng").tablesorter({ 21 headers: { 22 3: { 23 sorter: false 24 }, 25 4: { 26 sorter: false 27 } 28 } 29 }); 30 $("#vect").tablesorter({ 31 }); 32 $("#prngf").tablesorter({ 33 headers: { 34 3: { 35 sorter: false 36 }, 37 4: { 38 sorter: false 39 } 40 } 41 }); 42 } ); 43 </script> 44 45 <script type="text/javascript" src="https://polyfill.io/v3/polyfill.min.js?features=es6"></script> 46 <script type="text/javascript" src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"></script> 47 48 </head> 49 <body> 50 <div id=header><code>xoshiro</code> / <code>xoroshiro</code> generators and the PRNG shootout</div> 51 52 <div id=left> 53 <ul id="left-nav"> 54 <li><a href="http://vigna.di.unimi.it/">Home</a></li> 55 <li><a href="http://vigna.di.unimi.it/papers.php">Papers<span class=arrow>➔</span></a></li> 56 <li><a href="http://vigna.di.unimi.it/software.php">Software<span class=arrow>➔</span></a> 57 <li><strong><a href="http://prng.di.unimi.it/">PRNG shootout</a></strong> 58 59 <ul> 60 <li><a href="#intro">Introduction</A></li> 61 <li><a href="#shootout">A PRNG Shootout</A></li> 62 <li><a href="#speed">Speed</A></li> 63 <li><a href="#quality">Quality</A></li> 64 <li><a href="#remarks">Remarks</A></li> 65 </ul> 66 67 68 <li><a href="http://pcg.di.unimi.it/pcg.php">The wrap-up on PCG generators<span class=arrow>➔</span></a> 69 <li><a href="http://fastutil.di.unimi.it/"><code>fastutil</code><span class=arrow>➔</span></a> 70 <li><a href="http://dsiutils.di.unimi.it/">DSI utilities<span class=arrow>➔</span></a> 71 <li><a href="http://webgraph.di.unimi.it/">WebGraph<span class=arrow>➔</span></a> 72 <li><a href="http://sux.di.unimi.it/">Sux<span class=arrow>➔</span></a> 73 <li><a href="http://vigna.di.unimi.it/music.php">Music<span class=arrow>➔</span></a> 74 <li><a href="https://shrinkai.di.unimi.it/">Shrink AI<span class=arrow>➔</span></a> 75 </ul> 76 </div> 77 78 <div id="main"> 79 80 81 <div id="content"> 82 83 <h1 class=first>Introduction</h1> 84 85 <p>This page describes some new pseudorandom number generators (PRNGs) we (David Blackman and I) have been working on recently, and 86 a shootout comparing them with other generators. Details about the generators can 87 be found in our <a 88 href="http://vigna.di.unimi.it/papers.php#BlVSLPNG">paper</a>. Information about my previous <code>xorshift</code>-based 89 generators can be found <a href="xorshift.php">here</a>, but they have been entirely superseded by the new ones, which 90 are faster <em>and</em> better. As part of our study, we developed a very strong <a href="hwd.php">test for Hamming-weight dependencies</a> 91 that gave a number of surprising results. 92 93 <h1>64-bit Generators</h1> 94 95 <P><a href="xoshiro256plusplus.c"><code>xoshiro256++</code></a>/<a href="xoshiro256starstar.c"><code>xoshiro256**</code></a> 96 (XOR/shift/rotate) are our <strong>all-purpose</strong> 97 generators (not <em>cryptographically secure</em> generators, though, 98 like all PRNGs in these pages). They have excellent (sub-ns) speed, a state 99 space (256 bits) that is large enough for any parallel application, and 100 they pass all tests we are aware of. See the <a href="http://vigna.di.unimi.it/papers.php#BlVSLPNG">paper</a> 101 for a discussion of their differences. 102 103 <p>If, however, one has to generate only 64-bit <strong>floating-point</strong> numbers 104 (by extracting the upper 53 bits) <a 105 href="xoshiro256plus.c"><code>xoshiro256+</code></a> is a slightly (≈15%) 106 faster generator with analogous statistical properties. For general 107 usage, one has to consider that its lowest bits have low linear 108 complexity and will <a href="lowcomp.php">fail linearity tests</a>; however, low linear 109 complexity of the lowest bits can have hardly any impact in practice, and certainly has no 110 impact at all if you generate floating-point numbers using the upper bits (we computed a <a 111 href="http://vigna.di.unimi.it/papers.php#BlVSLPNG">precise 112 estimate</a> of the linear complexity of the lowest bits). 113 114 <p>If you are <strong>tight on space</strong>, <a 115 href="xoroshiro128plusplus.c"><code>xoroshiro128++</code></a>/<a 116 href="xoroshiro128starstar.c"><code>xoroshiro128**</code></a> 117 (XOR/rotate/shift/rotate) and <a 118 href="xoroshiro128plus.c"><code>xoroshiro128+</code></a> have the same 119 speed and use half of the space; the same comments apply. They are suitable only for 120 low-scale parallel applications; moreover, <code>xoroshiro128+</code> 121 exhibits a mild dependency in Hamming weights that generates a failure 122 after 5 TB of output in <a href="hwd.php">our test</a>. We believe 123 this slight bias cannot affect any application. 124 125 <p>Finally, if for any reason (which reason?) you need <strong>more 126 state</strong>, we provide in the same 127 vein <a href="xoshiro512plusplus.c"><code>xoshiro512++</code></a> / <a href="xoshiro512starstar.c"><code>xoshiro512**</code></a> / <a href="xoshiro512plus.c"><code>xoshiro512+</code></a> and 128 <a href="xoroshiro1024plusplus.c"><code>xoroshiro1024++</code></a> / <a href="xoroshiro1024starstar.c"><code>xoroshiro1024**</code></a> / <a href="xoroshiro1024star.c"><code>xoroshiro1024*</code></a> (see the <a 129 href="http://vigna.di.unimi.it/papers.php#BlVSLPNG">paper</a>). 130 131 <p>All generators, being based on linear recurrences, provide <em>jump 132 functions</em> that make it possible to simulate any number of calls to 133 the next-state function in constant time, once a suitable <em>jump 134 polynomial</em> has been computed. We provide ready-made jump functions for 135 a number of calls equal to the square root of the period, to make it easy 136 generating non-overlapping sequences for parallel computations, and equal 137 to the cube of the fourth root of the period, to make it possible to 138 generate independent sequences on different parallel processors. 139 140 <p>We suggest to use <a href="splitmix64.c"><span 141 style="font-variant: small-caps">SplitMix64</span></a> to initialize 142 the state of our generators starting from a 64-bit seed, as <a href="https://dl.acm.org/citation.cfm?doid=1276927.1276928">research 143 has shown</a> that initialization must be performed with a generator 144 radically different in nature from the one initialized to avoid 145 correlation on similar seeds. 146 147 148 <h1>32-bit Generators</h1> 149 150 <P><a href="xoshiro128plusplus.c"><code>xoshiro128++</code></a>/<a href="xoshiro128starstar.c"><code>xoshiro128**</code></a> are our 151 <strong>32-bit</strong> all-purpose generators, whereas <a 152 href="xoshiro128plus.c"><code>xoshiro128+</code></a> is 153 for floating-point generation. They are the 32-bit counterpart of 154 <code>xoshiro256++</code>, <code>xoshiro256**</code> and <code>xoshiro256+</code>, so similar comments apply. 155 Their state is too small for 156 large-scale parallelism: their intended usage is inside embedded 157 hardware or GPUs. For an even smaller scale, you can use <a 158 href="xoroshiro64starstar.c"><code>xoroshiro64**</code></a> and <a 159 href="xoroshiro64star.c"><code>xoroshiro64*</code></a>. We not believe 160 at this point in time 32-bit generator with a larger state can be of 161 any use (but there are 32-bit <code>xoroshiro</code> generators of much larger size). 162 163 <p>All 32-bit generators pass all tests we are aware of, with the 164 exception of linearity tests (binary rank and linear complexity) for 165 <code>xoshiro128+</code> and <code>xoroshiro64*</code>: in this case, 166 due to the smaller number of output bits the low linear complexity of the 167 lowest bits is sufficient to trigger BigCrush tests when the output is bit-reversed. Analogously to 168 the 64-bit case, generating 32-bit floating-point number using the 169 upper bits will not use any of the bits with <a href="lowcomp.php">low linear complexity</a>. 170 171 <h1>16-bit Generators</h1> 172 173 <p>We do not suggest any particular 16-bit generator, but it is possible 174 to design relatively good ones using our techniques. For example, 175 Parallax has embedded in their <a href="https://www.parallax.com/propeller-2/">Propeller 2 microcontroller</a> multiple 16-bit 176 <code>xoroshiro32++</code> generators. 177 178 <h1>Congruential Generators</h1> 179 180 <p>In case you are interested in 64-bit PRNGs based on congruential arithmetic, I provide 181 three instances of a 182 <a href="https://groups.google.com/forum/#!searchin/sci.stat.math/Yet$20another$20rng%7Csort:date/sci.stat.math/p7aLW3TsJys/QGb1kti6kN0J">Marsaglia's Multiply-With-Carry generators</a>, 183 <a href="MWC128.c"><code>MWC128</code></a>, <a href="MWC192.c"><code>MWC192</code></a>, and <a href="MWC256.c"><code>MWC256</code></a>, for which I computed good constants. They are some 184 of the fastest generator available, but they need 128-bit operations. 185 186 <p>Stronger theoretical guarantees are provided by the 187 <a href="https://www.math.ias.edu/~goresky/pdf/p1-goresky.pdf">generalized multiply-with-carry generators defined by Goresky and Klapper</a>: 188 also in this case I provide two instances, <a href="GMWC128.c"><code>GMWC128</code></a> and <a href="GMWC256.c"><code>GMWC256</code></a>, for which I computed good constants. 189 This generators, however, are about twice slower than MWC generators. 190 191 <h1>JavaScript</h1> 192 193 <p><code>xorshift128+</code> is presently used in the JavaScript engines of 194 <a href="http://v8project.blogspot.com/2015/12/theres-mathrandom-and-then-theres.html">Chrome</a>, 195 <a href="https://nodejs.org/">Node.js</a>, 196 <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=322529#c99">Firefox</a>, 197 <a href="https://bugs.webkit.org/show_bug.cgi?id=151641">Safari</a> and 198 <a href="https://github.com/Microsoft/ChakraCore/commit/dbda0182dc0a983dfb37d90c05000e79b6fc75b0">Microsoft Edge</a>. 199 200 <h1>Rust</h1> 201 <p>The <a href="https://docs.rs/rand/latest/rand/rngs/struct.SmallRng.html">SmallRng</a> from the <a href="https://docs.rs/rand/latest/rand/">rand</a> 202 crate is <a HREF="xoshiro256plusplus.c"><code>xoshiro256++</code></a> or <a HREF="xoshiro128plusplus.c"><code>xoshiro128++</code></a>, depending 203 on the platform. 204 205 <h1><code>java.util.random</code></h1> 206 207 <p>I worked with Guy Steele at the <a 208 href="https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/random/package-summary.html">new 209 family of PRNGs available in Java 17</a>. The family, called <a 210 href="http://vigna.di.unimi.it/papers.php#StVLXM">LXM</a>, uses <a 211 href="http://vigna.di.unimi.it/papers.php#StVCESGMCPNG">new, better 212 tables of multipliers for LCGs with power-of-two moduli</a>. Moreover, 213 <code>java.util.random</code> contains ready-to-use implementations of 214 <a HREF="xoroshiro128plusplus.c"><code>xoroshiro128++</code></a> and <a 215 HREF="xoshiro256plusplus.c"><code>xoshiro256++</code></a>. 216 217 <h1>.NET</h1> 218 219 <p>In version 6, Microsoft's .NET framework <a href="https://devblogs.microsoft.com/dotnet/performance-improvements-in-net-6/">has adopted</a> 220 <a HREF="xoshiro256starstar.c"><code>xoshiro256**</code></a> and <a 221 HREF="xoshiro128starstar.c"><code>xoshiro128**</code></a> as default PRNGs. 222 223 <h1>Erlang</h1> 224 225 <p>The parallel functional language <a href="https://www.erlang.org/">Erlang</a> implements <a href="https://www.erlang.org/doc/man/rand.html">several 226 variants of <code>xorshift</code>/<code>xoroshiro</code>-based generators</a> adapted in collaboration with Raimo Niskanen for Erlang's 227 58/59-bit arithmetic. 228 229 <h1>GNU FORTRAN</h1> 230 <p>GNU's <a href="https://gcc.gnu.org/fortran/">implementation of the FORTRAN language</a> <a href="https://gcc.gnu.org/onlinedocs/gfortran/RANDOM_005fNUMBER.html">uses</a> 231 <a HREF="xoshiro256starstar.c"><code>xoshiro256**</code></a> as default PRNG. 232 233 <h1>Julia</h1> 234 <p>The <a href="https://julialang.org/">Julia programming language</a> <a href="https://docs.julialang.org/en/v1/stdlib/Random/">uses</a> 235 <a HREF="xoshiro256plusplus.c"><code>xoshiro256++</code></a> as default PRNG. 236 237 <h1>Lua</h1> 238 <p>The scripting language <a href="http://www.lua.org/">Lua</a> <a href="https://www.lua.org/manual/5.4/manual.html#pdf-math.random">uses</a> <a HREF="xoshiro256starstar.c"><code>xoshiro256**</code></a> as default PRNG. 239 240 <h1>IoT</h1> 241 242 <p>The IoT operating systems <a href="https://os.mbed.com/">Mbed</a> and <a href="https://www.zephyrproject.org/">Zephyr</a> use 243 <a HREF="xoroshiro128plus.c"><code>xoroshiro128+</code></a> as default PRNG. 244 245 <h1><a name=shootout></a>A PRNG Shootout</h1> 246 247 <p>I provide here a shootout of a few recent 64-bit PRNGs that are quite widely used. 248 The purpose is that of providing a consistent, reproducible assessment of two properties of the generators: speed and quality. 249 The code used to perform the tests and all the output from statistical test suites is available for download. 250 251 <h2><a name=speed></a>Speed</h2> 252 253 <p>The speed reported in this page is the time required to emit 64 254 random bits, and the number of clock cycles required to generate a byte (thanks to the <a href="http://icl.utk.edu/papi/">PAPI</a> library). If a generator is 32-bit in nature, I glue two 255 consecutive outputs. Note that 256 I do not report results using GPUs or SSE instructions, with an exception for the very common SFMT: for that to be 257 meaningful, I should have implementations for all generators. 258 Otherwise, with suitable hardware support I could just use AES in 259 counter mode and get 64 secure bits in 0.56 ns (or just use <a href="https://github.com/google/randen">Randen</a>). The tests were performed on a 260 12th Gen Intel® Core™ i7-12700KF @3.60GHz using <code>gcc</code> 12.2.1. 261 262 <p>A few <i>caveats</i>: 263 <ul> 264 <li>There is some looping overhead, but subtracting it from the timings is not going to 265 be particularly meaningful due to instruction rescheduling, etc. 266 <li>Relative speed might be different on different CPUs and on different scenarios. 267 <li>I do not use <code>-march=native</code>, which can improve the timing of some generators 268 by vectorization or special instructions, because those improvements might not be possible 269 when the generator is embedded in user code. 270 <li>Code has been compiled using <code>gcc</code>'s <code>-fno-unroll-loops</code> 271 option. This options is essential to get a sensible result: without it, the compiler 272 may perform different loop unrolling depending on the generator. Previosuly I was using also 273 <code>-fno-move-loop-invariants</code>, which was essential in not giving generators using several 274 large constants an advantage by preventing the compiler from loading them into registers. However, 275 as of <code>gcc</code> 12.2.1 the compiler loads the constants into registers anyway, so the 276 option is no longer used. Timings 277 with <a href="http://clang.llvm.org/"><code>clang</code></a> at the time of this writing 278 are very close to those obtained with <code>gcc</code>. 279 If you find timings that are significantly better than those shown here on 280 comparable hardware, they are likely to be due to compiler artifacts (e.g., vectorization). 281 <li>Timings are taken running a generator for billions of times in a loop; but this is not the way you use generators. Register 282 allocation might be very different when the generator is embedded in an application, leading to constants being reloaded 283 or part of the state space being written to main memory at each iteration. These costs do not appear in the benchmarks below. 284 </ul> 285 286 <p>To ease replicability, I distribute a <a href="harness.c"><em>harness</em></a> performing the measurement. You just 287 have to define a <a href="xoroshiro128plus-speed.c"><code>next()</code></a> function and include the harness. But the only realistic 288 suggestion is to try different generators in your application and see what happens. 289 290 <h2><a name=quality></a>Quality</h2> 291 292 <p>This is probably the more <a 293 href="http://dilbert.com/strips/comic/2001-10-25/">elusive</a> property 294 of a PRNG. Here quality is measured using the powerful 295 BigCrush suite of tests. BigCrush is part of <a 296 href="http://simul.iro.umontreal.ca/testu01/tu01.html">TestU01</a>, 297 a monumental framework for testing PRNGs developed by Pierre L'Ecuyer 298 and Richard Simard (“TestU01: A C library for empirical testing 299 of random number generators”, <i>ACM Trans. Math. Softw.</i> 300 33(4), Article 22, 2007). 301 302 <p>I run BigCrush starting from 100 equispaced points of the state space 303 of the generator and collect <em>failures</em>—tests in which the 304 <i>p</i>-value statistics is outside the interval [0.001..0.999]. A failure 305 is <em>systematic</em> if it happens at all points. 306 307 <p>Note that TestU01 is a 32-bit test suite. Thus, two 32-bit integer values 308 are passed to the test suite for each generated 64-bit value. Floating point numbers 309 are generated instead by dividing the unsigned output of the generator by 2<sup>64</sup>. 310 Since this implies a bias towards the high bits (which is anyway a known characteristic 311 of TestU01), I run the test suite also on the <em>reverse</em> 312 generator. More detail about the whole process can be found in this <a 313 href="http://vigna.di.unimi.it/papers.php#VigEEMXGS">paper</a>. 314 315 <p>Beside BigCrush, I analyzed generators using a test for <a href="hwd.php">Hamming-weight dependencies</a> 316 described in our <a href="http://vigna.di.unimi.it/papers.php#BlVNTHWD">paper</a>. As I already remarked, our only 317 generator failing the test (but only after 5 TB of output) is <code>xoroshiro128+</code>. 318 319 <p>I report the period of each generator and its footprint in bits: a generator gives “bang-for-the-buck” 320 if the base-2 logarithm of the period is close to the footprint. Note 321 that the footprint has been always padded to a multiple of 64, and it can 322 be significantly larger than expected because of padding and 323 cyclic access indices. 324 325 <div style="align: center"><table id='prng' style='margin: 2em 0' class='tablesorter'> 326 <thead><tr> 327 <th>PRNG 328 <th>Footprint (bits) 329 <th class="{ sorter: 'metadata' }">Period 330 <th><a href="http://simul.iro.umontreal.ca/testu01/tu01.html">BigCrush</a> Systematic Failures 331 <th><a href="http://prng.di.unimi.it/hwd.php">HWD failure</a> 332 <th>ns/64 bits 333 <th>cycles/B 334 <tbody> 335 <tr><td><a href="xoroshiro128plus.c"><code>xoroshiro128+</code></a><td align=right>128 <td align=right class='{sortValue: 128}'>2<sup>128</sup> − 1<td align=right>—<td align=right>5 TB<td align=right>0.80<td align=right>0.36 336 <tr><td><a href="xoroshiro128plusplus.c"><code>xoroshiro128++</code></a><td align=right>128 <td align=right class='{sortValue: 128}'>2<sup>128</sup> − 1<td align=right>—<td align=right>—<td align=right>0.90<td align=right>0.40 337 <tr><td><a href="xoroshiro128starstar.c"><code>xoroshiro128**</code></a><td align=right>128 <td align=right class='{sortValue: 128}'>2<sup>128</sup> − 1<td align=right>—<td align=right>—<td align=right>0.78<td align=right>0.36 338 <tr><td><a href="xoshiro256plus.c"><code>xoshiro256+</code></a><td align=right>256 <td align=right class='{sortValue: 256}'>2<sup>256</sup> − 1<td align=right>—<td align=right>—<td align=right>0.61<td align=right>0.27 339 <tr><td><a href="xoshiro256plusplus.c"><code>xoshiro256++</code></a><td align=right>256 <td align=right class='{sortValue: 256}'>2<sup>256</sup> − 1<td align=right>—<td align=right>—<td align=right>0.75<td align=right>0.34 340 <tr><td><a href="xoshiro256starstar.c"><code>xoshiro256**</code></a><td align=right>256 <td align=right class='{sortValue: 256}'>2<sup>256</sup> − 1<td align=right>—<td align=right>—<td align=right>0.75<td align=right>0.34 341 <tr><td><a href="xoshiro512plus.c"><code>xoshiro512+</code></a><td align=right>512<td align=right class='{sortValue: 512}'>2<sup>512</sup> − 1<td align=right>—<td align=right>—<td align=right>0.68<td align=right>0.30 342 <tr><td><a href="xoshiro512plusplus.c"><code>xoshiro512++</code></a><td align=right>512<td align=right class='{sortValue: 512}'>2<sup>512</sup> − 1<td align=right>—<td align=right>—<td align=right>0.79<td align=right>0.36 343 <tr><td><a href="xoshiro512starstar.c"><code>xoshiro512**</code></a><td align=right>512<td align=right class='{sortValue: 512}'>2<sup>512</sup> − 1<td align=right>—<td align=right>—<td align=right>0.81<td align=right>0.37 344 <tr><td><a href="xoroshiro1024star.c"><code>xoroshiro1024*</code></a><td align=right>1068<td align=right class='{sortValue: 1024}'>2<sup>1024</sup> − 1<td align=right>—<td align=right>—<td align=right>0.82<td align=right>0.37 345 <tr><td><a href="xoroshiro1024plusplus.c"><code>xoroshiro1024++</code></a><td align=right>1068<td align=right class='{sortValue: 1024}'>2<sup>1024</sup> − 1<td align=right>—<td align=right>—<td align=right>1.01<td align=right>0.46 346 <tr><td><a href="xoroshiro1024starstar.c"><code>xoroshiro1024**</code></a><td align=right>1068<td align=right class='{sortValue: 1024}'>2<sup>1024</sup> − 1<td align=right>—<td align=right>—<td align=right>0.98<td align=right>0.44 347 <tr><td><a href="MWC128.c"><span style="font-variant: small-caps">MWC128</span></a><td align=right>128 <td align=right class='{sortValue: 127}'>≈2<sup>127</sup><td align=right>—<td align=right>—<td align=right>0.83<td align=right>0.37 348 <tr><td><a href="MWC192.c"><span style="font-variant: small-caps">MWC192</span></a><td align=right>192 <td align=right class='{sortValue: 127}'>≈2<sup>191</sup><td align=right>—<td align=right>—<td align=right>1.42<td align=right>0.19 349 <tr><td><a href="MWC256.c"><span style="font-variant: small-caps">MWC256</span></a><td align=right>256 <td align=right class='{sortValue: 255}'>≈2<sup>255</sup><td align=right>—<td align=right>—<td align=right>0.45<td align=right>0.20 350 <tr><td><a href="GMWC128.c"><span style="font-variant: small-caps">GMWC128</span></a><td align=right>128 <td align=right class='{sortValue: 127}'>≈2<sup>127</sup><td align=right>—<td align=right>—<td align=right>1.84<td align=right>0.83 351 <tr><td><a href="GMWC256.c"><span style="font-variant: small-caps">GMWC256</span></a><td align=right>256 <td align=right class='{sortValue: 255}'>≈2<sup>255</sup><td align=right>—<td align=right>—<td align=right>1.85<td align=right>0.83 352 <tr><td><a href="http://pracrand.sourceforge.net/"><span style="font-variant: small-caps">SFC64</span></a><td align=right>256 <td align=right class='{sortValue: 64}'>≥2<sup>64</sup><td align=right>—<td align=right>—<td align=right>0.66<td align=right>0.30 353 <tr><td><a href="splitmix64.c"><span style="font-variant: small-caps">SplitMix64</span></a><td align=right>64 <td align=right class='{sortValue: 64}'>2<sup>64</sup><td align=right>—<td align=right>—<td align=right>0.63<td align=right>0.29 354 <tr><td><a href="http://pcg-random.org/">PCG 128 XSH RS 64 (LCG)</a> <td align=right>128 <td align=right class='{sortValue: 128}'>2<sup>128</sup><td align=right>—<td align=right>—<td align=right>1.70<td align=right>0.77 355 <tr><td><a href="https://github.com/numpy/numpy">PCG64-DXSM (NumPy)</a> <td align=right>128 <td align=right class='{sortValue: 128}'>2<sup>128</sup><td align=right>—<td align=right>—<td align=right>1.11<td align=right>0.50 356 <tr><td><a href="http://numerical.recipes/"><code>Ran</code></a><td align=right>192 <td align=right class='{sortValue: 191}'>≈2<sup>191</sup><td align=right>—<td align=right>—<td align=right>1.37<td align=right>0.62 357 <tr><td><a href="http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/emt.html"><code>MT19937-64</code> (Mersenne Twister)</a><td align=right>20032 <td align=right class='{sortValue: 19937}'>2<sup>19937</sup> − 1<td align=right>LinearComp<td align=right>—<td align=right>1.36<td align=right>0.62 358 <tr><td><a href="http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/SFMT/"><code>SFMT19937 (uses SSE2 instructions)</code></a><td align=right>20032 <td align=right class='{sortValue: 19937}'>2<sup>19937</sup> − 1<td align=right>LinearComp<td align=right>—<td align=right>0.93<td align=right>0.42 359 <tr><td><a href="http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/SFMT/"><code>SFMT607 (uses SSE2 instructions)</code></a><td align=right>672 <td align=right class='{sortValue: 607}'>2<sup>607</sup> − 1<td align=right>MatrixRank, LinearComp<td align=right>400 MB<td align=right>0.78<td align=right>0.34 360 <tr><td><a href="http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/TINYMT/index.html">Tiny Mersenne Twister</a> (64 bits)<td align=right>256<td align=right class='{sortValue: 127}'>2<sup>127</sup> − 1<td align=right>—<td align=right>90 TB→<td align=right>2.76<td align=right>1.25 361 <tr><td><a href="http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/TINYMT/index.html">Tiny Mersenne Twister</a> (32 bits)<td align=right>224<td align=right class='{sortValue: 127}'>2<sup>127</sup> − 1<td align=right>CollisionOver, Run, SimPoker, AppearanceSpacings, MatrixRank, LinearComp, LongestHeadRun, Run of Bits (reversed)<td align=right>40 TB→<td align=right>4.27<td align=right>1.92 362 <tr><td><a href="http://www.iro.umontreal.ca/~panneton/WELLRNG.html"><code>WELL512a</code></a><td align=right>544 <td align=right class='{sortValue: 512}'>2<sup>512</sup> − 1 <td align=right>MatrixRank, LinearComp<td align=right>3.5 PB<td align=right>5.42<td align=right>2.44 363 <tr><td><a href="http://www.iro.umontreal.ca/~panneton/WELLRNG.html"><code>WELL1024a</code></a><td align=right>1056 <td align=right class='{sortValue: 1024}'>2<sup>1024</sup> − 1 <td align=right>MatrixRank, LinearComp<td align=right>—<td align=right>5.30<td align=right>2.38 364 </table></div> 365 366 <p>The following table compares instead two ways of generating floating-point numbers, namely the 521-bit <a href="http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/SFMT/">dSFMT</a>, which 367 generates directly floating-point numbers with 52 significant bits, and 368 <a href="xoshiro256plus.c"><code>xoshiro256+</code></a> followed by a standard conversion of its upper bits to a floating-point number with 53 significant bits (see below). 369 370 <div style="align: center"><table id='prngf' style='margin: 2em 0' class='tablesorter'> 371 <thead><tr> 372 <th>PRNG 373 <th>Footprint (bits) 374 <th class="{ sorter: 'metadata' }">Period 375 <th> <a href="http://simul.iro.umontreal.ca/testu01/tu01.html">BigCrush</a> Systematic Failures 376 <th><a href="http://prng.di.unimi.it/hwd.php">HWD failure</a> 377 <th>ns/double 378 <th>cycles/B 379 <tbody> 380 <tr><td><a href="xoshiro256plus.c"><code>xoshiro256+</code></a> (returns 53 significant bits) <td align=right>256<td align=right class='{sortValue: 256}'>2<sup>256</sup> − 1<td align=right>—<td align=right>—<td align=right>0.92<td align=right>3.40 381 <tr><td><a href="http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/SFMT/"><code>dSFMT</code></a> (uses SSE2 instructions, returns only 52 significant bits)<td align=right>704<td align=right class='{sortValue: 521}'>2<sup>521</sup> − 1<td align=right>MatrixRank, LinearComp<td align=right>6 TB<td align=right>0.85<td align=right>3.07 382 </table></div> 383 384 <p><code>xoshiro256+</code> is ≈8% slower than the dSFMT, but it has a doubled range of output values, does not need any extra SSE instruction (can be programmed in Java, etc.), 385 has a much smaller footprint, and its upper bits do not fail any test. 386 387 <h1><a name=remarks></a>Remarks</h1> 388 389 <h2>Vectorization</h2> 390 391 <p>Some of the generators can be very easily vectorized, so that multiple instances can be run in parallel to provide 392 fast bulk generation. Thanks to an interesting <a href="https://github.com/JuliaLang/julia/issues/27614">discussion with the Julia developers</a>, 393 I've become aware that AVX2 vectorizations of multiple instances of generators using the <code>+</code>/<code>++</code> scrambler are impressively fast (links 394 below point at a speed test to be used with the <a href="harness.c">harness</a>, and the result will be multiplied by 1000): 395 396 <div style="align: center"><table id='vec' style='margin: 2em 0' class='tablesorter'> 397 <thead><tr> 398 <th>PRNG 399 <th>ns/64 bits 400 <th>cycles/B 401 <tbody> 402 <tr><td><a href="xoroshiro128+-vect-speed.c"><code>xoroshiro128+</code></a> (4 parallel instances)<td align=right>0.36<td align=right>0.14 403 <tr><td><a href="xoroshiro128++-vect-speed.c"><code>xoroshiro128++</code></a> (4 parallel instances)<td align=right>0.45<td align=right>0.18 404 <tr><td><a href="xoshiro256+-vect-speed.c"><code>xoshiro256+</code></a> (8 parallel instances)<td align=right>0.19<td align=right>0.08 405 <tr><td><a href="xoshiro256++-vect-speed.c"><code>xoshiro256++</code></a> (8 parallel instances)<td align=right>0.26<td align=right>0.09 406 </table></div> 407 408 <p>Note that sometimes convincing the compiler to vectorize is a 409 slightly quirky process: for example, on <code>gcc</code> 12.2.1 I have to use <code>-O3 -fdisable-tree-cunrolli -march=native</code> 410 to vectorize <code>xoshiro256</code>-based generators 411 (<code>-O3</code> alone will not vectorize; thanks to to Chris Elrod for pointing me at <code>-fdisable-tree-cunrolli</code>). 412 413 <h2>A long period does not imply high quality</h2> 414 415 <p>This is a common misconception. The generator <code>x++</code> has 416 period \(2^k\), for any \(k\geq0\), provided that <code>x</code> is 417 represented using \(k\) bits: nonetheless, it is a horrible generator. 418 The generator returning \(k-1\) zeroes followed by a one has period 419 \(k\). 420 421 <p>It is however important that the period is long enough. A first heuristic rule of thumb 422 is that if you need to use \(t\) values, you need a generator with period at least \(t^2\). 423 Moreover, if you run \(n\) independent computations starting at random seeds, 424 the sequences used by each computation should not overlap. 425 426 <p>Now, given a generator with period \(P\), the probability that \(n\) subsequences of length \(L\) starting at random points in the state space 427 overlap <a href="http://vigna.di.unimi.it/papers.php#VigPORSPNG">is bounded by \(n^2L/P\)</a>. If your generator has period \(2^{256}\) and you run 428 on \(2^{64}\) cores (you will never have them) a computation using \(2^{64}\) pseudorandom numbers (you will never have the time) 429 the probability of overlap would be less than \(2^{-64}\). 430 431 <p>In other words: any generator with a period beyond 432 \(2^{256}\) has a period that is 433 sufficient for every imaginable application. Unless there are other motivations (e.g., provably 434 increased quality), a generator with a larger period is only a waste of 435 memory (as it needs a larger state), of cache lines, and of 436 precious high-entropy random bits for seeding (unless you're using 437 small seeds, but then it's not clear why you would want a very long 438 period in the first place—the computation above is valid only if you seed all bits of the state 439 with independent, uniformly distributed random bits). 440 441 <p>In case the generator provides a <em>jump function</em> that lets you skip through chunks of the output in constant 442 time, even a period of \(2^{128}\) can be sufficient, as it provides \(2^{64}\) non-overlapping sequences of length \(2^{64}\). 443 444 <h2>Equidistribution</h2> 445 446 <p>Every 64-bit generator of ours with <var>n</var> bits of state scrambled 447 with <code>*</code> or <code>**</code> is <var>n</var>/64-dimensionally 448 equidistributed: every <var>n</var>/64-tuple of consecutive 64-bit 449 values appears exactly once in the output, except for the zero tuple 450 (and this is the largest possible dimension). Generators based on the 451 <code>+</code> or <code>++</code> scramblers are however only (<var>n</var>/64 − 452 1)-dimensionally equidistributed: every (<var>n</var>/64 − 453 1)-tuple of consecutive 64-bit values appears exactly 2<sup>64</sup> 454 times in the output, except for a missing zero tuple. The same considerations 455 apply to 32-bit generators. 456 457 <h2>Generating uniform doubles in the unit interval</h2> 458 459 <p>A standard double (64-bit) floating-point number in 460 <a href="https://en.wikipedia.org/wiki/IEEE_floating_point">IEEE floating point format</a> has 52 bits of 461 significand, plus an implicit bit at the left of the significand. Thus, 462 the representation can actually store numbers with <em>53</em> significant binary digits. 463 464 <p>Because of this fact, in C99 a 64-bit unsigned integer <code>x</code> should be converted to a 64-bit double 465 using the expression 466 <pre> 467 #include <stdint.h> 468 469 (x >> 11) * 0x1.0p-53 470 </pre> 471 <p>In Java you can use almost the same expression for a (signed) 64-bit integer: 472 <pre> 473 (x >>> 11) * 0x1.0p-53 474 </pre> 475 476 477 <p>This conversion guarantees that all dyadic rationals of the form <var>k</var> / 2<sup>−53</sup> 478 will be equally likely. Note that this conversion prefers the high bits of <code>x</code> (usually, a good idea), but you can alternatively 479 use the lowest bits. 480 481 <p>An alternative, multiplication-free conversion is 482 <pre> 483 #include <stdint.h> 484 485 static inline double to_double(uint64_t x) { 486 const union { uint64_t i; double d; } u = { .i = UINT64_C(0x3FF) << 52 | x >> 12 }; 487 return u.d - 1.0; 488 } 489 </pre> 490 <p>The code above cooks up by bit manipulation 491 a real number in the interval [1..2), and then subtracts 492 one to obtain a real number in the interval [0..1). If <code>x</code> is chosen uniformly among 64-bit integers, 493 <code>d</code> is chosen uniformly among dyadic rationals of the form <var>k</var> / 2<sup>−52</sup>. This 494 is the same technique used by generators providing directly doubles, such as the 495 <a href="http://dx.doi.org/10.1007/978-3-540-85912-3_26">dSFMT</a>. 496 497 <p>This technique is supposed to be fast, but on recent hardare it does not seem to give a significant advantage. 498 More importantly, <em>you will be generating half the values you could actually generate</em>. 499 The same problem plagues the dSFMT. All doubles generated will have the lowest significand bit set to zero (I must 500 thank Raimo Niskanen from the Erlang team for making me notice this—a previous version of this site 501 did not mention this issue). 502 503 <p>In Java you can obtain an analogous result using suitable static methods: 504 <pre> 505 Double.longBitsToDouble(0x3FFL << 52 | x >>> 12) - 1.0 506 </pre> 507 508 <p>To adhere to the principle of least surprise, my implementations now use the multiplicative version, everywhere. 509 510 <p>Interestingly, these are not the only notions of “uniformity” you can come up with. Another possibility 511 is that of generating 1074-bit integers, normalize and return the nearest value representable as a 512 64-bit double (this is the theory—in practice, you will almost never 513 use more than two integers per double as the remaining bits would not be representable). This approach guarantees that all 514 representable doubles could be in principle generated, albeit not every 515 returned double will appear with the same probability. A reference 516 implementation can be found <a href="random_real.c">here</a>. Note that unless your generator has 517 at least 1074 bits of state and suitable equidistribution properties, the code above will not do what you expect 518 (e.g., it might <em>never</em> return zero). 519 520 521 </div> 522 523 524 </div> 525 526 <div id="right"> 527 528 <!-- <h1>Download</h1> 529 <p><ul> 530 <li><a HREF="prng-1.2.tgz">source tarball</A> 531 <li><a HREF="prng-data.tar.bz2">data tarball (large!)</a> 532 </ul> 533 --> 534 <h1>C code (64 bits)</h1> 535 <p><ul> 536 <li><a HREF="xoshiro256plusplus.c"><code>xoshiro256++</code></a> 537 <li><a HREF="xoshiro256starstar.c"><code>xoshiro256**</code></a> 538 <li><a HREF="xoshiro256plus.c"><code>xoshiro256+</code></a> 539 <li><a HREF="xoroshiro128plusplus.c"><code>xoroshiro128++</code></a> 540 <li><a HREF="xoroshiro128starstar.c"><code>xoroshiro128**</code></a> 541 <li><a HREF="xoroshiro128plus.c"><code>xoroshiro128+</code></a> 542 <li><a HREF="https://github.com/vigna/MRG32k3a">Testless <code>MRG32k3a</code></a> 543 <li><a HREF="MWC128.c"><code>MWC128</code></a> + <a HREF="mp.c"><code>mp.c</code></a> 544 <li><a HREF="MWC192.c"><code>MWC192</code></a> + <a HREF="mp.c"><code>mp.c</code></a> 545 <li><a HREF="MWC256.c"><code>MWC256</code></a> + <a HREF="mp.c"><code>mp.c</code></a> 546 <li><a HREF="GMWC128.c"><code>GMWC128</code></a> + <a HREF="mp.c"><code>mp.c</code></a> 547 <li><a HREF="GMWC256.c"><code>GMWC256</code></a> + <a HREF="mp.c"><code>mp.c</code></a> 548 <!-- <li><a HREF="xoshiro512starstar.c"><code>xoshiro512**</code></a> 549 <li><a HREF="xoshiro512plus.c"><code>xoshiro512+</code></a> 550 <li><a HREF="xoroshiro1024starstar.c"><code>xoroshiro1024**</code></a> 551 <li><a HREF="xoroshiro1024plus.c"><code>xoroshiro1024*</code></a>--> 552 </ul> 553 554 <h1>C code (32 bits)</h1> 555 <p><ul> 556 <li><a HREF="xoshiro128plusplus.c"><code>xoshiro128++</code></a> 557 <li><a HREF="xoshiro128starstar.c"><code>xoshiro128**</code></a> 558 <li><a HREF="xoshiro128plus.c"><code>xoshiro128+</code></a> 559 <li><a HREF="xoroshiro64starstar.c"><code>xoroshiro64**</code></a> 560 <li><a HREF="xoroshiro64star.c"><code>xoroshiro64*</code></a> 561 </ul> 562 563 <h1>Java code (<a HREF="https://github.com/openjdk/jdk17/tree/master/src/jdk.random/share/classes/jdk/random"><code>java.util.random</code></a>)</h1> 564 565 <h1>Java code (<a href="http://dsiutils.di.unimi.it">DSI utilities</a>)</h1> 566 <p><ul> 567 <li><a HREF="http://dsiutils.di.unimi.it/docs/it/unimi/dsi/util/package-summary.html">Overview</a> 568 <li><a HREF="http://dsiutils.di.unimi.it/docs/it/unimi/dsi/util/XoShiRo256PlusPlusRandom.html"><code>xoshiro256++</code></a> 569 <li><a HREF="http://dsiutils.di.unimi.it/docs/it/unimi/dsi/util/XoShiRo256StarStarRandom.html"><code>xoshiro256**</code></a> 570 <li><a HREF="http://dsiutils.di.unimi.it/docs/it/unimi/dsi/util/XoShiRo256PlusRandom.html"><code>xoshiro256+</code></a> 571 <li><a HREF="http://dsiutils.di.unimi.it/docs/it/unimi/dsi/util/XoRoShiRo128PlusPlusRandom.html"><code>xoroshiro128++</code></a> 572 <li><a HREF="http://dsiutils.di.unimi.it/docs/it/unimi/dsi/util/XoRoShiRo128StarStarRandom.html"><code>xoroshiro128**</code></a> 573 <li><a HREF="http://dsiutils.di.unimi.it/docs/it/unimi/dsi/util/XoRoShiRo128PlusRandom.html"><code>xoroshiro128+</code></a> 574 <li><a HREF="https://github.com/vigna/MRG32k3a">Testless <code>MRG32k3a</code></a> 575 </ul> 576 577 <h1>Java code (<a HREF="https://gitbox.apache.org/repos/asf?p=commons-rng.git">Apache Commons RNG implementations</a>)</h1> 578 579 <h1>Documentation</h1> 580 <p><ul> 581 <li>The <a href="http://vigna.di.unimi.it/papers.php#BlVSLPNG">paper</a> introducing <code>xoshiro</code>/<code>xoroshiro</code>. 582 <li>The <a href="http://vigna.di.unimi.it/papers.php#BlVNTHWD">paper</a> describing our <a href="hwd.php">test for Hamming-weight dependencies</a>. 583 <li>A <a href="http://vigna.di.unimi.it/papers.php#VigHTLGMT">paper</a> discussing the defects of the Mersenne Twister family of PRNGs. 584 <li>A <a href="http://vigna.di.unimi.it/papers.php#VigPORSPNG">paper</a> discussing the probability of overlap of random subsequences. 585 <li>A <a href="http://vigna.di.unimi.it/papers.php#StVCESGMCPNG">paper</a> with new tables of multipliers for LCGs with power-of-two moduli. 586 <li>A <a href="http://vigna.di.unimi.it/papers.php#StVLXM">paper</a> presenting the family LXM of PRNGs. 587 </ul> 588 589 <h1>Discussion</h1> 590 591 <p>There is a <a href="http://groups.google.com/group/prng">discussion group</a> 592 about this page. You can join or <a href="mailto:prng@googlegroups.com">send a message</a>. 593 <h1><a href="https://validator.w3.org/check/referer">This is valid HTML 4.01</a></h1> 594 595 </div> 596 </body> 597 </html>