squiggle.c

Self-contained Monte Carlo estimation in C99
Log | Files | Refs | README

index.html (40899B)


      1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
      2 <html>
      3   <head> 
      4     <link href="https://fonts.googleapis.com/css?family=Open+Sans:400,400i,700,700i" rel="stylesheet">
      5     <meta http-equiv="Content-Type" content="text/html;charset=utf-8"> 
      6     <meta name="keywords" content="rng, prng, xoshiro, xoroshiro, xorshift, pseudorandom number generator, random number generator">
      7     <style type="text/css">
      8       @import "css/content.php";
      9       @import "css/layout.php";
     10 		@import "css/tablesorter.css";
     11 
     12     </style>
     13     <title>xoshiro/xoroshiro generators and the PRNG shootout</title> 
     14 		<script type="text/javascript" src="js/jquery.js"></script> 
     15 		<script type="text/javascript" src="js/tablesorter.js"></script>
     16 		<script type="text/javascript" src="js/metadata.js"></script> 
     17 		<script type="text/javascript">
     18 		$.tablesorter.defaults.widgets = ['zebra']; 	
     19 		$(document).ready( function() { 
     20 			$("#prng").tablesorter({
     21 				headers: { 
     22 	            3: { 
     23       	          sorter: false 
     24          	   },
     25 	            4: { 
     26       	          sorter: false 
     27          	   }
     28 				}
     29 			});
     30 			$("#vect").tablesorter({
     31 			});
     32 			$("#prngf").tablesorter({
     33 				headers: { 
     34 	            3: { 
     35       	          sorter: false 
     36          	   },
     37 	            4: { 
     38       	          sorter: false 
     39          	   }
     40 				}
     41 			});
     42 		} );
     43 		</script>
     44 
     45 		<script type="text/javascript" src="https://polyfill.io/v3/polyfill.min.js?features=es6"></script>
     46 		<script type="text/javascript" src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"></script>
     47 
     48   </head> 
     49   <body>
     50     <div id=header><code>xoshiro</code> / <code>xoroshiro</code> generators and the PRNG shootout</div>
     51 
     52     <div id=left>
     53       <ul id="left-nav">
     54 	<li><a href="http://vigna.di.unimi.it/">Home</a></li>
     55 	<li><a href="http://vigna.di.unimi.it/papers.php">Papers<span class=arrow>&#10132;</span></a></li>
     56 	<li><a href="http://vigna.di.unimi.it/software.php">Software<span class=arrow>&#10132;</span></a>
     57 	<li><strong><a href="http://prng.di.unimi.it/">PRNG shootout</a></strong>
     58 
     59       <ul>
     60 	<li><a href="#intro">Introduction</A></li>
     61 	<li><a href="#shootout">A PRNG Shootout</A></li>
     62 	<li><a href="#speed">Speed</A></li>
     63 	<li><a href="#quality">Quality</A></li>
     64 	<li><a href="#remarks">Remarks</A></li>
     65       </ul>
     66 
     67 	
     68    <li><a href="http://pcg.di.unimi.it/pcg.php">The wrap-up on PCG generators<span class=arrow>&#10132;</span></a>
     69    <li><a href="http://fastutil.di.unimi.it/"><code>fastutil</code><span class=arrow>&#10132;</span></a>
     70 	<li><a href="http://dsiutils.di.unimi.it/">DSI utilities<span class=arrow>&#10132;</span></a>
     71 	<li><a href="http://webgraph.di.unimi.it/">WebGraph<span class=arrow>&#10132;</span></a>
     72 	<li><a href="http://sux.di.unimi.it/">Sux<span class=arrow>&#10132;</span></a>
     73 	<li><a href="http://vigna.di.unimi.it/music.php">Music<span class=arrow>&#10132;</span></a>
     74 	<li><a href="https://shrinkai.di.unimi.it/">Shrink AI<span class=arrow>&#10132;</span></a>
     75     </ul>
     76     </div>
     77 
     78     <div id="main">
     79       
     80       
     81       <div id="content">
     82 
     83 	<h1 class=first>Introduction</h1>
     84 
     85 	<p>This page describes some new pseudorandom number generators (PRNGs) we (David Blackman and I) have been working on recently, and
     86    a shootout comparing them with other generators. Details about the generators can
     87    be found in our <a
     88    href="http://vigna.di.unimi.it/papers.php#BlVSLPNG">paper</a>. Information about my previous <code>xorshift</code>-based
     89    generators can be found <a href="xorshift.php">here</a>, but they have been entirely superseded by the new ones, which
     90    are faster <em>and</em> better. As part of our study, we developed a very strong <a href="hwd.php">test for Hamming-weight dependencies</a> 
     91    that gave a number of surprising results.
     92 
     93 	<h1>64-bit Generators</h1>
     94 
     95    <P><a href="xoshiro256plusplus.c"><code>xoshiro256++</code></a>/<a href="xoshiro256starstar.c"><code>xoshiro256**</code></a>
     96    (XOR/shift/rotate) are our <strong>all-purpose</strong>
     97 	generators (not <em>cryptographically secure</em> generators, though,
     98 	like all PRNGs in these pages). They have excellent (sub-ns) speed, a state
     99 	space (256 bits) that is large enough for any parallel application, and
    100 	they pass all tests we are aware of. See the <a href="http://vigna.di.unimi.it/papers.php#BlVSLPNG">paper</a>
    101    for a discussion of their differences.
    102 
    103 	<p>If, however, one has to generate only 64-bit <strong>floating-point</strong> numbers
    104    (by extracting the upper 53 bits) <a
    105    href="xoshiro256plus.c"><code>xoshiro256+</code></a> is a slightly (&asymp;15%)
    106    faster generator with analogous statistical properties. For general
    107    usage, one has to consider that its lowest bits have low linear
    108    complexity and will <a href="lowcomp.php">fail linearity tests</a>; however, low linear
    109    complexity of the lowest bits can have hardly any impact in practice, and certainly has no
    110    impact at all if you generate floating-point numbers using the upper bits (we computed a <a
    111    href="http://vigna.di.unimi.it/papers.php#BlVSLPNG">precise
    112    estimate</a> of the linear complexity of the lowest bits).
    113 
    114 	<p>If you are <strong>tight on space</strong>, <a
    115    href="xoroshiro128plusplus.c"><code>xoroshiro128++</code></a>/<a
    116    href="xoroshiro128starstar.c"><code>xoroshiro128**</code></a>
    117    (XOR/rotate/shift/rotate) and  <a
    118    href="xoroshiro128plus.c"><code>xoroshiro128+</code></a> have the same
    119    speed and use half of the space; the same comments apply. They are suitable only for
    120    low-scale parallel applications; moreover, <code>xoroshiro128+</code>
    121    exhibits a mild dependency in Hamming weights that generates a failure
    122    after 5&thinsp;TB of output in <a href="hwd.php">our test</a>. We believe
    123    this slight bias cannot affect any application.
    124 
    125 	<p>Finally, if for any reason (which reason?) you need <strong>more
    126 	state</strong>, we provide in the same
    127 	vein <a href="xoshiro512plusplus.c"><code>xoshiro512++</code></a> / <a href="xoshiro512starstar.c"><code>xoshiro512**</code></a> / <a href="xoshiro512plus.c"><code>xoshiro512+</code></a> and
    128 	<a href="xoroshiro1024plusplus.c"><code>xoroshiro1024++</code></a> / <a href="xoroshiro1024starstar.c"><code>xoroshiro1024**</code></a> / <a href="xoroshiro1024star.c"><code>xoroshiro1024*</code></a> (see the <a
    129    href="http://vigna.di.unimi.it/papers.php#BlVSLPNG">paper</a>).
    130 
    131 	<p>All generators, being based on linear recurrences, provide <em>jump
    132    functions</em> that make it possible to simulate any number of calls to
    133    the next-state function in constant time, once a suitable <em>jump
    134    polynomial</em> has been computed. We provide ready-made jump functions for
    135    a number of calls equal to the square root of the period, to make it easy
    136    generating non-overlapping sequences for parallel computations, and equal
    137 	to the cube of the fourth root of the period, to make it possible to
    138 	generate independent sequences on different parallel processors.
    139 
    140 	<p>We suggest to use <a href="splitmix64.c"><span
    141 	style="font-variant: small-caps">SplitMix64</span></a> to initialize
    142 	the state of our generators starting from a 64-bit seed, as <a href="https://dl.acm.org/citation.cfm?doid=1276927.1276928">research
    143 	has shown</a> that initialization must be performed with a generator
    144 	radically different in nature from the one initialized to avoid
    145 	correlation on similar seeds.
    146 
    147 
    148 	<h1>32-bit Generators</h1>
    149 
    150    <P><a href="xoshiro128plusplus.c"><code>xoshiro128++</code></a>/<a href="xoshiro128starstar.c"><code>xoshiro128**</code></a> are our
    151 	<strong>32-bit</strong> all-purpose generators, whereas <a
    152 	href="xoshiro128plus.c"><code>xoshiro128+</code></a> is 
    153 	for floating-point generation. They are the 32-bit counterpart of
    154 	<code>xoshiro256++</code>, <code>xoshiro256**</code> and <code>xoshiro256+</code>, so similar comments apply.
    155 	Their state is too small for
    156 	large-scale parallelism: their intended usage is inside embedded
    157 	hardware or GPUs. For an even smaller scale, you can use <a
    158 	href="xoroshiro64starstar.c"><code>xoroshiro64**</code></a> and <a
    159 	href="xoroshiro64star.c"><code>xoroshiro64*</code></a>. We not believe
    160 	at this point in time 32-bit generator with a larger state can be of
    161 	any use (but there are 32-bit <code>xoroshiro</code> generators of much larger size).
    162 
    163 	<p>All 32-bit generators pass all tests we are aware of, with the
    164    exception of linearity tests (binary rank and linear complexity) for
    165    <code>xoshiro128+</code> and <code>xoroshiro64*</code>: in this case,
    166    due to the smaller number of output bits the low linear complexity of the
    167    lowest bits is sufficient to trigger BigCrush tests when the output is bit-reversed. Analogously to
    168    the 64-bit case, generating 32-bit floating-point number using the
    169    upper bits will not use any of the bits with <a href="lowcomp.php">low linear complexity</a>.
    170 
    171 	<h1>16-bit Generators</h1>
    172 
    173 	<p>We do not suggest any particular 16-bit generator, but it is possible
    174 	to design relatively good ones using our techniques. For example,
    175 	Parallax has embedded in their <a href="https://www.parallax.com/propeller-2/">Propeller 2 microcontroller</a> multiple 16-bit
    176 	<code>xoroshiro32++</code> generators.
    177 
    178 	<h1>Congruential Generators</h1>
    179 
    180 	<p>In case you are interested in 64-bit PRNGs based on congruential arithmetic, I provide
    181 	three instances of a
    182 	<a href="https://groups.google.com/forum/#!searchin/sci.stat.math/Yet$20another$20rng%7Csort:date/sci.stat.math/p7aLW3TsJys/QGb1kti6kN0J">Marsaglia's Multiply-With-Carry generators</a>,
    183 	<a href="MWC128.c"><code>MWC128</code></a>, <a href="MWC192.c"><code>MWC192</code></a>, and <a href="MWC256.c"><code>MWC256</code></a>, for which I computed good constants. They are some
    184 	of the fastest generator available, but they need 128-bit operations.
    185 
    186 	<p>Stronger theoretical guarantees are provided by the
    187 	<a href="https://www.math.ias.edu/~goresky/pdf/p1-goresky.pdf">generalized multiply-with-carry generators defined by Goresky and Klapper</a>:
    188 	also in this case I provide two instances, <a href="GMWC128.c"><code>GMWC128</code></a> and <a href="GMWC256.c"><code>GMWC256</code></a>, for which I computed good constants.
    189 	This generators, however, are about twice slower than MWC generators.
    190 
    191 	<h1>JavaScript</h1>
    192 
    193 	<p><code>xorshift128+</code> is presently used in the JavaScript engines of
    194 	<a href="http://v8project.blogspot.com/2015/12/theres-mathrandom-and-then-theres.html">Chrome</a>,
    195 	<a href="https://nodejs.org/">Node.js</a>,
    196 	<a href="https://bugzilla.mozilla.org/show_bug.cgi?id=322529#c99">Firefox</a>,
    197 	<a href="https://bugs.webkit.org/show_bug.cgi?id=151641">Safari</a> and
    198 	<a href="https://github.com/Microsoft/ChakraCore/commit/dbda0182dc0a983dfb37d90c05000e79b6fc75b0">Microsoft Edge</a>.
    199 
    200 	<h1>Rust</h1>
    201 	<p>The <a href="https://docs.rs/rand/latest/rand/rngs/struct.SmallRng.html">SmallRng</a> from the <a href="https://docs.rs/rand/latest/rand/">rand</a>
    202 	crate is <a HREF="xoshiro256plusplus.c"><code>xoshiro256++</code></a> or <a HREF="xoshiro128plusplus.c"><code>xoshiro128++</code></a>, depending
    203 	on the platform.
    204 
    205 	<h1><code>java.util.random</code></h1>
    206 
    207 	<p>I worked with Guy Steele at the <a
    208 	href="https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/random/package-summary.html">new
    209 	family of PRNGs available in Java 17</a>. The family, called <a
    210 	href="http://vigna.di.unimi.it/papers.php#StVLXM">LXM</a>, uses <a
    211 	href="http://vigna.di.unimi.it/papers.php#StVCESGMCPNG">new, better
    212 	tables of multipliers for LCGs with power-of-two moduli</a>. Moreover,
    213 	<code>java.util.random</code> contains ready-to-use implementations of
    214 	<a HREF="xoroshiro128plusplus.c"><code>xoroshiro128++</code></a> and <a
    215 	HREF="xoshiro256plusplus.c"><code>xoshiro256++</code></a>.
    216 
    217 	<h1>.NET</h1>
    218 
    219 	<p>In version 6, Microsoft's .NET framework <a href="https://devblogs.microsoft.com/dotnet/performance-improvements-in-net-6/">has adopted</a>
    220 	<a HREF="xoshiro256starstar.c"><code>xoshiro256**</code></a> and <a
    221    HREF="xoshiro128starstar.c"><code>xoshiro128**</code></a> as default PRNGs.
    222 
    223 	<h1>Erlang</h1>
    224 
    225 	<p>The parallel functional language <a href="https://www.erlang.org/">Erlang</a> implements <a href="https://www.erlang.org/doc/man/rand.html">several
    226 	variants of <code>xorshift</code>/<code>xoroshiro</code>-based generators</a> adapted in collaboration with Raimo Niskanen for Erlang's
    227 	58/59-bit arithmetic.
    228 	
    229 	<h1>GNU FORTRAN</h1>
    230 	<p>GNU's <a href="https://gcc.gnu.org/fortran/">implementation of the FORTRAN language</a> <a href="https://gcc.gnu.org/onlinedocs/gfortran/RANDOM_005fNUMBER.html">uses</a>
    231 	<a HREF="xoshiro256starstar.c"><code>xoshiro256**</code></a> as default PRNG. 
    232 
    233 	<h1>Julia</h1>
    234 	<p>The <a href="https://julialang.org/">Julia programming language</a> <a href="https://docs.julialang.org/en/v1/stdlib/Random/">uses</a>
    235 	<a HREF="xoshiro256plusplus.c"><code>xoshiro256++</code></a> as default PRNG. 
    236 
    237 	<h1>Lua</h1>
    238 	<p>The scripting language <a href="http://www.lua.org/">Lua</a> <a href="https://www.lua.org/manual/5.4/manual.html#pdf-math.random">uses</a> <a HREF="xoshiro256starstar.c"><code>xoshiro256**</code></a> as default PRNG.
    239 
    240 	<h1>IoT</h1>
    241 
    242 	<p>The IoT operating systems <a href="https://os.mbed.com/">Mbed</a> and <a href="https://www.zephyrproject.org/">Zephyr</a> use
    243 	<a HREF="xoroshiro128plus.c"><code>xoroshiro128+</code></a> as default PRNG.
    244 
    245 	<h1><a name=shootout>&#xfeff;</a>A PRNG Shootout</h1>
    246 	
    247 	<p>I provide here a shootout of a few recent 64-bit PRNGs that are quite widely used.
    248 	The purpose is that of providing a consistent, reproducible assessment of two properties of the generators: speed and quality.
    249 	The code used to perform the tests and all the output from statistical test suites is available for download.
    250 
    251 	<h2><a name=speed>&#xfeff;</a>Speed</h2>
    252 
    253 	<p>The speed reported in this page is the time required to emit 64
    254 	random bits, and the number of clock cycles required to generate a byte (thanks to the <a href="http://icl.utk.edu/papi/">PAPI</a> library). If a generator is 32-bit in nature, I glue two
    255 	consecutive outputs. Note that
    256 	I do not report results using GPUs or SSE instructions, with an exception for the very common SFMT: for that to be
    257 	meaningful, I should have implementations for all generators.
    258 	Otherwise, with suitable hardware support I could just use AES in
    259 	counter mode and get 64 secure bits in 0.56&thinsp;ns (or just use <a href="https://github.com/google/randen">Randen</a>). The tests were performed on a
    260 	12th Gen Intel&reg; Core&trade; i7-12700KF @3.60GHz using <code>gcc</code> 12.2.1.
    261 
    262 	<p>A few <i>caveats</i>:
    263 	<ul>
    264 	<li>There is some looping overhead, but subtracting it from the timings is not going to
    265 	be particularly meaningful due to instruction rescheduling, etc.
    266 	<li>Relative speed might be different on different CPUs and on different scenarios.
    267 	<li>I do not use <code>-march=native</code>, which can improve the timing of some generators
    268    by vectorization or special instructions, because those improvements might not be possible
    269    when the generator is embedded in user code.
    270 	<li>Code has been compiled using <code>gcc</code>'s <code>-fno-unroll-loops</code>
    271 	option. This options is essential to get a sensible result: without it, the compiler
    272 	may perform different loop unrolling depending on the generator. Previosuly I was using also 
    273 	<code>-fno-move-loop-invariants</code>, which was essential in not giving generators using several
    274    large constants an advantage by preventing the compiler from loading them into registers. However,
    275    as of <code>gcc</code> 12.2.1 the compiler loads the constants into registers anyway, so the 
    276    option is no longer used. Timings
    277 	with <a href="http://clang.llvm.org/"><code>clang</code></a> at the time of this writing
    278 	are very close to those obtained with <code>gcc</code>.
    279 	If you find timings that are significantly better than those shown here on 
    280 	comparable hardware, they are likely to be due to compiler artifacts (e.g., vectorization).
    281 	<li>Timings are taken running a generator for billions of times in a loop; but this is not the way you use generators. Register
    282    allocation might be very different when the generator is embedded in an application, leading to constants being reloaded
    283    or part of the state space being written to main memory at each iteration. These costs do not appear in the benchmarks below.
    284 	</ul>
    285 
    286 	<p>To ease replicability, I distribute a <a href="harness.c"><em>harness</em></a> performing the measurement. You just
    287 	have to define a <a href="xoroshiro128plus-speed.c"><code>next()</code></a> function and include the harness. But the only realistic
    288 	suggestion is to try different generators in your application and see what happens.
    289 
    290 	<h2><a name=quality>&#xfeff;</a>Quality</h2>
    291 
    292 	<p>This is probably the more <a
    293 	href="http://dilbert.com/strips/comic/2001-10-25/">elusive</a> property
    294 	of a PRNG. Here quality is measured using the powerful
    295 	BigCrush suite of tests. BigCrush is part of <a
    296 	href="http://simul.iro.umontreal.ca/testu01/tu01.html">TestU01</a>,
    297 	a monumental framework for testing PRNGs developed by Pierre L'Ecuyer
    298 	and Richard Simard (&ldquo;TestU01: A C library for empirical testing
    299 	of random number generators&rdquo;, <i>ACM Trans. Math. Softw.</i>
    300 	33(4), Article 22, 2007).
    301 
    302 	<p>I run BigCrush starting from 100 equispaced points of the state space
    303 	of the generator and collect <em>failures</em>&mdash;tests in which the
    304 	<i>p</i>-value statistics is outside the interval [0.001..0.999]. A failure
    305 	is <em>systematic</em> if it happens at all points.
    306 
    307 	<p>Note that TestU01 is a 32-bit test suite. Thus, two 32-bit integer values 
    308 	are passed to the test suite for each generated 64-bit value. Floating point numbers
    309 	are generated instead by dividing the unsigned output of the generator by 2<sup>64</sup>.
    310 	Since this implies a bias towards the high bits (which is anyway a known characteristic
    311 	of TestU01), I run the test suite also on the <em>reverse</em>
    312 	generator. More detail about the whole process can be found in this <a
    313 	href="http://vigna.di.unimi.it/papers.php#VigEEMXGS">paper</a>.
    314 
    315 	<p>Beside BigCrush, I analyzed generators using a test for <a href="hwd.php">Hamming-weight dependencies</a>
    316 	described in our <a href="http://vigna.di.unimi.it/papers.php#BlVNTHWD">paper</a>. As I already remarked, our only
    317 	generator failing the test (but only after 5&thinsp;TB of output) is <code>xoroshiro128+</code>.
    318 
    319 	<p>I report the period of each generator and its footprint in bits: a generator gives &ldquo;bang-for-the-buck&rdquo;
    320 	if the base-2 logarithm of the period is close to the footprint. Note
    321    that the footprint has been always padded to a multiple of 64, and it can
    322    be significantly larger than expected because of padding and
    323    cyclic access indices.
    324 
    325 <div style="align: center"><table id='prng' style='margin: 2em 0' class='tablesorter'>
    326 <thead><tr>
    327 <th>PRNG
    328 <th>Footprint (bits)
    329 <th class="{ sorter: 'metadata' }">Period
    330 <th><a href="http://simul.iro.umontreal.ca/testu01/tu01.html">BigCrush</a> Systematic Failures
    331 <th><a href="http://prng.di.unimi.it/hwd.php">HWD failure</a>
    332 <th>ns/64 bits
    333 <th>cycles/B
    334 <tbody>
    335 <tr><td><a href="xoroshiro128plus.c"><code>xoroshiro128+</code></a><td align=right>128  <td align=right class='{sortValue: 128}'>2<sup>128</sup>&nbsp;&minus;&nbsp;1<td align=right>&mdash;<td align=right>5&thinsp;TB<td align=right>0.80<td align=right>0.36
    336 <tr><td><a href="xoroshiro128plusplus.c"><code>xoroshiro128++</code></a><td align=right>128  <td align=right class='{sortValue: 128}'>2<sup>128</sup>&nbsp;&minus;&nbsp;1<td align=right>&mdash;<td align=right>&mdash;<td align=right>0.90<td align=right>0.40
    337 <tr><td><a href="xoroshiro128starstar.c"><code>xoroshiro128**</code></a><td align=right>128  <td align=right class='{sortValue: 128}'>2<sup>128</sup>&nbsp;&minus;&nbsp;1<td align=right>&mdash;<td align=right>&mdash;<td align=right>0.78<td align=right>0.36
    338 <tr><td><a href="xoshiro256plus.c"><code>xoshiro256+</code></a><td align=right>256  <td align=right class='{sortValue: 256}'>2<sup>256</sup>&nbsp;&minus;&nbsp;1<td align=right>&mdash;<td align=right>&mdash;<td align=right>0.61<td align=right>0.27
    339 <tr><td><a href="xoshiro256plusplus.c"><code>xoshiro256++</code></a><td align=right>256  <td align=right class='{sortValue: 256}'>2<sup>256</sup>&nbsp;&minus;&nbsp;1<td align=right>&mdash;<td align=right>&mdash;<td align=right>0.75<td align=right>0.34
    340 <tr><td><a href="xoshiro256starstar.c"><code>xoshiro256**</code></a><td align=right>256  <td align=right class='{sortValue: 256}'>2<sup>256</sup>&nbsp;&minus;&nbsp;1<td align=right>&mdash;<td align=right>&mdash;<td align=right>0.75<td align=right>0.34
    341 <tr><td><a href="xoshiro512plus.c"><code>xoshiro512+</code></a><td align=right>512<td align=right class='{sortValue: 512}'>2<sup>512</sup>&nbsp;&minus;&nbsp;1<td align=right>&mdash;<td align=right>&mdash;<td align=right>0.68<td align=right>0.30
    342 <tr><td><a href="xoshiro512plusplus.c"><code>xoshiro512++</code></a><td align=right>512<td align=right class='{sortValue: 512}'>2<sup>512</sup>&nbsp;&minus;&nbsp;1<td align=right>&mdash;<td align=right>&mdash;<td align=right>0.79<td align=right>0.36
    343 <tr><td><a href="xoshiro512starstar.c"><code>xoshiro512**</code></a><td align=right>512<td align=right class='{sortValue: 512}'>2<sup>512</sup>&nbsp;&minus;&nbsp;1<td align=right>&mdash;<td align=right>&mdash;<td align=right>0.81<td align=right>0.37
    344 <tr><td><a href="xoroshiro1024star.c"><code>xoroshiro1024*</code></a><td align=right>1068<td align=right class='{sortValue: 1024}'>2<sup>1024</sup>&nbsp;&minus;&nbsp;1<td align=right>&mdash;<td align=right>&mdash;<td align=right>0.82<td align=right>0.37
    345 <tr><td><a href="xoroshiro1024plusplus.c"><code>xoroshiro1024++</code></a><td align=right>1068<td align=right class='{sortValue: 1024}'>2<sup>1024</sup>&nbsp;&minus;&nbsp;1<td align=right>&mdash;<td align=right>&mdash;<td align=right>1.01<td align=right>0.46
    346 <tr><td><a href="xoroshiro1024starstar.c"><code>xoroshiro1024**</code></a><td align=right>1068<td align=right class='{sortValue: 1024}'>2<sup>1024</sup>&nbsp;&minus;&nbsp;1<td align=right>&mdash;<td align=right>&mdash;<td align=right>0.98<td align=right>0.44
    347 <tr><td><a href="MWC128.c"><span style="font-variant: small-caps">MWC128</span></a><td align=right>128  			<td align=right class='{sortValue: 127}'>&asymp;2<sup>127</sup><td align=right>&mdash;<td align=right>&mdash;<td align=right>0.83<td align=right>0.37
    348 <tr><td><a href="MWC192.c"><span style="font-variant: small-caps">MWC192</span></a><td align=right>192  			<td align=right class='{sortValue: 127}'>&asymp;2<sup>191</sup><td align=right>&mdash;<td align=right>&mdash;<td align=right>1.42<td align=right>0.19
    349 <tr><td><a href="MWC256.c"><span style="font-variant: small-caps">MWC256</span></a><td align=right>256 			<td align=right class='{sortValue: 255}'>&asymp;2<sup>255</sup><td align=right>&mdash;<td align=right>&mdash;<td align=right>0.45<td align=right>0.20
    350 <tr><td><a href="GMWC128.c"><span style="font-variant: small-caps">GMWC128</span></a><td align=right>128  			<td align=right class='{sortValue: 127}'>&asymp;2<sup>127</sup><td align=right>&mdash;<td align=right>&mdash;<td align=right>1.84<td align=right>0.83
    351 <tr><td><a href="GMWC256.c"><span style="font-variant: small-caps">GMWC256</span></a><td align=right>256 			<td align=right class='{sortValue: 255}'>&asymp;2<sup>255</sup><td align=right>&mdash;<td align=right>&mdash;<td align=right>1.85<td align=right>0.83
    352 <tr><td><a href="http://pracrand.sourceforge.net/"><span style="font-variant: small-caps">SFC64</span></a><td align=right>256 			<td align=right class='{sortValue: 64}'>&ge;2<sup>64</sup><td align=right>&mdash;<td align=right>&mdash;<td align=right>0.66<td align=right>0.30
    353 <tr><td><a href="splitmix64.c"><span style="font-variant: small-caps">SplitMix64</span></a><td align=right>64  			<td align=right class='{sortValue: 64}'>2<sup>64</sup><td align=right>&mdash;<td align=right>&mdash;<td align=right>0.63<td align=right>0.29
    354 <tr><td><a href="http://pcg-random.org/">PCG 128 XSH RS 64 (LCG)</a> <td align=right>128 <td align=right class='{sortValue: 128}'>2<sup>128</sup><td align=right>&mdash;<td align=right>&mdash;<td align=right>1.70<td align=right>0.77
    355 <tr><td><a href="https://github.com/numpy/numpy">PCG64-DXSM (NumPy)</a> <td align=right>128 <td align=right class='{sortValue: 128}'>2<sup>128</sup><td align=right>&mdash;<td align=right>&mdash;<td align=right>1.11<td align=right>0.50
    356 <tr><td><a href="http://numerical.recipes/"><code>Ran</code></a><td align=right>192  <td align=right class='{sortValue: 191}'>&#8776;2<sup>191</sup><td align=right>&mdash;<td align=right>&mdash;<td align=right>1.37<td align=right>0.62
    357 <tr><td><a href="http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/emt.html"><code>MT19937-64</code> (Mersenne Twister)</a><td align=right>20032 <td align=right class='{sortValue: 19937}'>2<sup>19937</sup>&nbsp;&minus;&nbsp;1<td align=right>LinearComp<td align=right>&mdash;<td align=right>1.36<td align=right>0.62
    358 <tr><td><a href="http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/SFMT/"><code>SFMT19937 (uses SSE2 instructions)</code></a><td align=right>20032 <td align=right class='{sortValue: 19937}'>2<sup>19937</sup>&nbsp;&minus;&nbsp;1<td align=right>LinearComp<td align=right>&mdash;<td align=right>0.93<td align=right>0.42
    359 <tr><td><a href="http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/SFMT/"><code>SFMT607 (uses SSE2 instructions)</code></a><td align=right>672 <td align=right class='{sortValue: 607}'>2<sup>607</sup>&nbsp;&minus;&nbsp;1<td align=right>MatrixRank, LinearComp<td align=right>400 MB<td align=right>0.78<td align=right>0.34
    360 <tr><td><a href="http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/TINYMT/index.html">Tiny Mersenne Twister</a> (64 bits)<td align=right>256<td align=right class='{sortValue: 127}'>2<sup>127</sup>&nbsp;&minus;&nbsp;1<td align=right>&mdash;<td align=right>90&thinsp;TB→<td align=right>2.76<td align=right>1.25
    361 <tr><td><a href="http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/TINYMT/index.html">Tiny Mersenne Twister</a> (32 bits)<td align=right>224<td align=right class='{sortValue: 127}'>2<sup>127</sup>&nbsp;&minus;&nbsp;1<td align=right>CollisionOver, Run, SimPoker, AppearanceSpacings, MatrixRank, LinearComp, LongestHeadRun, Run of Bits (reversed)<td align=right>40&thinsp;TB→<td align=right>4.27<td align=right>1.92
    362 <tr><td><a href="http://www.iro.umontreal.ca/~panneton/WELLRNG.html"><code>WELL512a</code></a><td align=right>544  <td align=right class='{sortValue: 512}'>2<sup>512</sup>&nbsp;&minus;&nbsp;1  <td align=right>MatrixRank, LinearComp<td align=right>3.5 PB<td align=right>5.42<td align=right>2.44
    363 <tr><td><a href="http://www.iro.umontreal.ca/~panneton/WELLRNG.html"><code>WELL1024a</code></a><td align=right>1056  <td align=right class='{sortValue: 1024}'>2<sup>1024</sup>&nbsp;&minus;&nbsp;1  <td align=right>MatrixRank, LinearComp<td align=right>&mdash;<td align=right>5.30<td align=right>2.38
    364 </table></div>
    365 
    366 	<p>The following table compares instead two ways of generating floating-point numbers, namely the 521-bit <a href="http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/SFMT/">dSFMT</a>, which
    367    generates directly  floating-point numbers with 52 significant bits, and
    368    <a href="xoshiro256plus.c"><code>xoshiro256+</code></a> followed by a standard conversion of its upper bits to a floating-point number with 53 significant bits (see below).
    369 
    370 <div style="align: center"><table id='prngf' style='margin: 2em 0' class='tablesorter'>
    371 <thead><tr>
    372 <th>PRNG
    373 <th>Footprint (bits)
    374 <th class="{ sorter: 'metadata' }">Period
    375 <th> <a href="http://simul.iro.umontreal.ca/testu01/tu01.html">BigCrush</a> Systematic Failures
    376 <th><a href="http://prng.di.unimi.it/hwd.php">HWD failure</a>
    377 <th>ns/double
    378 <th>cycles/B
    379 <tbody>
    380 <tr><td><a href="xoshiro256plus.c"><code>xoshiro256+</code></a> (returns 53 significant bits) <td align=right>256<td align=right class='{sortValue: 256}'>2<sup>256</sup>&nbsp;&minus;&nbsp;1<td align=right>&mdash;<td align=right>&mdash;<td align=right>0.92<td align=right>3.40
    381 <tr><td><a href="http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/SFMT/"><code>dSFMT</code></a>  (uses SSE2 instructions, returns only 52 significant bits)<td align=right>704<td align=right class='{sortValue: 521}'>2<sup>521</sup>&nbsp;&minus;&nbsp;1<td align=right>MatrixRank, LinearComp<td align=right>6&thinsp;TB<td align=right>0.85<td align=right>3.07
    382 </table></div>
    383 
    384 	<p><code>xoshiro256+</code> is &asymp;8% slower than the dSFMT, but it has a doubled range of output values, does not need any extra SSE instruction (can be programmed in Java, etc.),
    385    has a much smaller footprint, and its upper bits do not fail any test.
    386 
    387 	<h1><a name=remarks>&#xfeff;</a>Remarks</h1>
    388 
    389 	<h2>Vectorization</h2>
    390 
    391 	<p>Some of the generators can be very easily vectorized, so that multiple instances can be run in parallel to provide
    392 	fast bulk generation. Thanks to an interesting <a href="https://github.com/JuliaLang/julia/issues/27614">discussion with the Julia developers</a>,
    393 	I've become aware that AVX2 vectorizations of multiple instances of generators using the <code>+</code>/<code>++</code> scrambler are impressively fast (links
    394 	below point at a speed test to be used with the <a href="harness.c">harness</a>, and the result will be multiplied by 1000):
    395 
    396 <div style="align: center"><table id='vec' style='margin: 2em 0' class='tablesorter'>
    397 <thead><tr>
    398 <th>PRNG
    399 <th>ns/64 bits
    400 <th>cycles/B
    401 <tbody>
    402 <tr><td><a href="xoroshiro128+-vect-speed.c"><code>xoroshiro128+</code></a> (4 parallel instances)<td align=right>0.36<td align=right>0.14
    403 <tr><td><a href="xoroshiro128++-vect-speed.c"><code>xoroshiro128++</code></a> (4 parallel instances)<td align=right>0.45<td align=right>0.18
    404 <tr><td><a href="xoshiro256+-vect-speed.c"><code>xoshiro256+</code></a> (8 parallel instances)<td align=right>0.19<td align=right>0.08
    405 <tr><td><a href="xoshiro256++-vect-speed.c"><code>xoshiro256++</code></a> (8 parallel instances)<td align=right>0.26<td align=right>0.09
    406 </table></div>
    407 
    408 	<p>Note that sometimes convincing the compiler to vectorize is a
    409 	slightly quirky process: for example, on <code>gcc</code> 12.2.1 I have to use <code>-O3 -fdisable-tree-cunrolli -march=native</code>
    410 	to vectorize <code>xoshiro256</code>-based generators
    411 	(<code>-O3</code> alone will not vectorize; thanks to to Chris Elrod for pointing me at <code>-fdisable-tree-cunrolli</code>).
    412 
    413 	<h2>A long period does not imply high quality</h2>
    414 
    415 	<p>This is a common misconception. The generator <code>x++</code> has
    416 	period \(2^k\), for any \(k\geq0\), provided that <code>x</code> is
    417 	represented using \(k\) bits: nonetheless, it is a horrible generator.
    418 	The generator returning \(k-1\) zeroes followed by a one has period
    419    \(k\).
    420 
    421 	<p>It is however important that the period is long enough. A first heuristic rule of thumb
    422 	is that if you need to use \(t\) values, you need a generator with period at least \(t^2\).
    423 	Moreover, if you run \(n\) independent computations starting at random seeds,
    424 	the sequences used by each computation should not overlap.
    425 
    426 	<p>Now, given a generator with period \(P\), the probability that \(n\) subsequences of length \(L\) starting at random points in the state space
    427 	overlap <a href="http://vigna.di.unimi.it/papers.php#VigPORSPNG">is bounded by \(n^2L/P\)</a>. If your generator has period \(2^{256}\) and you run
    428 	on \(2^{64}\) cores (you will never have them) a computation using \(2^{64}\) pseudorandom numbers (you will never have the time)
    429 	the probability of overlap would be less than \(2^{-64}\).
    430 
    431 	<p>In other words: any generator with a period beyond
    432 	\(2^{256}\) has a period that is
    433 	sufficient for every imaginable application. Unless there are other motivations (e.g., provably
    434 	increased quality), a generator with a larger period is only a waste of
    435 	memory (as it needs a larger state), of cache lines, and of
    436 	precious high-entropy random bits for seeding (unless you're using
    437 	small seeds, but then it's not clear why you would want a very long
    438 	period in the first place&mdash;the computation above is valid only if you seed all bits of the state
    439 	with independent, uniformly distributed random bits).
    440 
    441 	<p>In case the generator provides a <em>jump function</em> that lets you skip through chunks of the output in constant
    442 	time, even a period of \(2^{128}\) can be sufficient, as it provides \(2^{64}\) non-overlapping sequences of length \(2^{64}\).
    443 
    444 	<h2>Equidistribution</h2>
    445 
    446 	<p>Every 64-bit generator of ours with <var>n</var> bits of state scrambled
    447 	with <code>*</code> or <code>**</code> is <var>n</var>/64-dimensionally
    448 	equidistributed: every <var>n</var>/64-tuple of consecutive 64-bit
    449 	values appears exactly once in the output, except for the zero tuple
    450 	(and this is the largest possible dimension). Generators based on the
    451 	<code>+</code> or <code>++</code> scramblers are however only (<var>n</var>/64 &minus;
    452 	1)-dimensionally equidistributed: every (<var>n</var>/64 &minus;
    453 	1)-tuple of consecutive 64-bit values appears exactly 2<sup>64</sup>
    454 	times in the output, except for a missing zero tuple. The same considerations
    455    apply to 32-bit generators.
    456 
    457 	<h2>Generating uniform doubles in the unit interval</h2>
    458 
    459 	<p>A standard double (64-bit) floating-point number in
    460 	<a href="https://en.wikipedia.org/wiki/IEEE_floating_point">IEEE floating point format</a> has 52 bits of
    461 	significand, plus an implicit bit at the left of the significand. Thus,
    462 	the representation can actually store numbers with <em>53</em> significant binary digits.
    463 
    464 	<p>Because of this fact, in C99 a 64-bit unsigned integer <code>x</code> should be converted to a 64-bit double 
    465 	using the expression
    466 <pre>
    467     #include &lt;stdint.h>
    468 
    469     (x >> 11) * 0x1.0p-53
    470 </pre>
    471 	<p>In Java you can use almost the same expression for a (signed) 64-bit integer:
    472 <pre>
    473     (x >>> 11) * 0x1.0p-53
    474 </pre>
    475 
    476 
    477 	<p>This conversion guarantees that all dyadic rationals of the form <var>k</var> / 2<sup>&minus;53</sup> 
    478 	will be equally likely. Note that this conversion prefers the high bits of <code>x</code> (usually, a good idea), but you can alternatively
    479 	use the lowest bits.
    480 
    481 	<p>An alternative, multiplication-free conversion is
    482 <pre>
    483     #include &lt;stdint.h>
    484 
    485     static inline double to_double(uint64_t x) {
    486        const union { uint64_t i; double d; } u = { .i = UINT64_C(0x3FF) &lt;&lt; 52 | x >> 12 };
    487        return u.d - 1.0;
    488     }
    489 </pre>
    490 	<p>The code above cooks up by bit manipulation
    491 	a real number in the interval [1..2), and then subtracts
    492 	one to obtain a real number in the interval [0..1). If <code>x</code> is chosen uniformly among 64-bit integers, 
    493 	<code>d</code> is chosen uniformly among dyadic rationals of the form <var>k</var> / 2<sup>&minus;52</sup>. This
    494 	is the same technique used by generators providing directly doubles, such as the 
    495 	<a href="http://dx.doi.org/10.1007/978-3-540-85912-3_26">dSFMT</a>.
    496 
    497 	<p>This technique is supposed to be fast, but on recent hardare it does not seem to give a significant advantage.
    498    More importantly, <em>you will be generating half the values you could actually generate</em>.
    499 	The same problem plagues the dSFMT. All doubles generated will have the lowest significand bit set to zero (I must
    500 	thank Raimo Niskanen from the Erlang team for making me notice this&mdash;a previous version of this site
    501 	did not mention this issue).
    502 
    503 	<p>In Java you can obtain an analogous result using suitable static methods:
    504 <pre>
    505     Double.longBitsToDouble(0x3FFL &lt;&lt; 52 | x >>> 12) - 1.0
    506 </pre>
    507 
    508 	<p>To adhere to the principle of least surprise, my implementations now use the multiplicative version, everywhere.
    509 
    510 	<p>Interestingly, these are not the only notions of &ldquo;uniformity&rdquo; you can come up with. Another possibility
    511 	is that of generating 1074-bit integers, normalize and return the nearest value representable as a
    512 	64-bit double (this is the theory&mdash;in practice, you will almost never
    513 	use more than two integers per double as the remaining bits would not be representable). This approach guarantees that all
    514 	representable doubles could be in principle generated, albeit not every
    515 	returned double will appear with the same probability. A reference
    516 	implementation can be found <a href="random_real.c">here</a>. Note that unless your generator has
    517 	at least 1074 bits of state and suitable equidistribution properties, the code above will not do what you expect
    518 	(e.g., it might <em>never</em> return zero).
    519 
    520 
    521 	  </div>
    522       
    523       
    524     </div>
    525     
    526     <div id="right">
    527 
    528 <!--      <h1>Download</h1>
    529       <p><ul>
    530 		<li><a HREF="prng-1.2.tgz">source tarball</A>
    531 		<li><a HREF="prng-data.tar.bz2">data tarball (large!)</a>
    532       </ul>
    533 -->
    534       <h1>C code (64 bits)</h1>
    535       <p><ul>
    536 		<li><a HREF="xoshiro256plusplus.c"><code>xoshiro256++</code></a>
    537 		<li><a HREF="xoshiro256starstar.c"><code>xoshiro256**</code></a>
    538 		<li><a HREF="xoshiro256plus.c"><code>xoshiro256+</code></a>
    539 		<li><a HREF="xoroshiro128plusplus.c"><code>xoroshiro128++</code></a>
    540 		<li><a HREF="xoroshiro128starstar.c"><code>xoroshiro128**</code></a>
    541 		<li><a HREF="xoroshiro128plus.c"><code>xoroshiro128+</code></a>
    542 		<li><a HREF="https://github.com/vigna/MRG32k3a">Testless <code>MRG32k3a</code></a>
    543 		<li><a HREF="MWC128.c"><code>MWC128</code></a> + <a HREF="mp.c"><code>mp.c</code></a>
    544 		<li><a HREF="MWC192.c"><code>MWC192</code></a> + <a HREF="mp.c"><code>mp.c</code></a>
    545 		<li><a HREF="MWC256.c"><code>MWC256</code></a> + <a HREF="mp.c"><code>mp.c</code></a>
    546 		<li><a HREF="GMWC128.c"><code>GMWC128</code></a> + <a HREF="mp.c"><code>mp.c</code></a>
    547 		<li><a HREF="GMWC256.c"><code>GMWC256</code></a> + <a HREF="mp.c"><code>mp.c</code></a>
    548 <!--		<li><a HREF="xoshiro512starstar.c"><code>xoshiro512**</code></a>
    549 		<li><a HREF="xoshiro512plus.c"><code>xoshiro512+</code></a>
    550 		<li><a HREF="xoroshiro1024starstar.c"><code>xoroshiro1024**</code></a>
    551 		<li><a HREF="xoroshiro1024plus.c"><code>xoroshiro1024*</code></a>-->
    552       </ul>
    553 
    554       <h1>C code (32 bits)</h1>
    555       <p><ul>
    556 		<li><a HREF="xoshiro128plusplus.c"><code>xoshiro128++</code></a>
    557 		<li><a HREF="xoshiro128starstar.c"><code>xoshiro128**</code></a>
    558 		<li><a HREF="xoshiro128plus.c"><code>xoshiro128+</code></a>
    559 		<li><a HREF="xoroshiro64starstar.c"><code>xoroshiro64**</code></a>
    560 		<li><a HREF="xoroshiro64star.c"><code>xoroshiro64*</code></a>
    561       </ul>
    562 
    563    	<h1>Java code (<a HREF="https://github.com/openjdk/jdk17/tree/master/src/jdk.random/share/classes/jdk/random"><code>java.util.random</code></a>)</h1>
    564 
    565       <h1>Java code (<a href="http://dsiutils.di.unimi.it">DSI utilities</a>)</h1>
    566       <p><ul>
    567 		<li><a HREF="http://dsiutils.di.unimi.it/docs/it/unimi/dsi/util/package-summary.html">Overview</a>
    568 		<li><a HREF="http://dsiutils.di.unimi.it/docs/it/unimi/dsi/util/XoShiRo256PlusPlusRandom.html"><code>xoshiro256++</code></a>
    569 		<li><a HREF="http://dsiutils.di.unimi.it/docs/it/unimi/dsi/util/XoShiRo256StarStarRandom.html"><code>xoshiro256**</code></a>
    570 		<li><a HREF="http://dsiutils.di.unimi.it/docs/it/unimi/dsi/util/XoShiRo256PlusRandom.html"><code>xoshiro256+</code></a>
    571 		<li><a HREF="http://dsiutils.di.unimi.it/docs/it/unimi/dsi/util/XoRoShiRo128PlusPlusRandom.html"><code>xoroshiro128++</code></a>
    572 		<li><a HREF="http://dsiutils.di.unimi.it/docs/it/unimi/dsi/util/XoRoShiRo128StarStarRandom.html"><code>xoroshiro128**</code></a>
    573 		<li><a HREF="http://dsiutils.di.unimi.it/docs/it/unimi/dsi/util/XoRoShiRo128PlusRandom.html"><code>xoroshiro128+</code></a>
    574 		<li><a HREF="https://github.com/vigna/MRG32k3a">Testless <code>MRG32k3a</code></a>
    575       </ul>
    576 
    577    	<h1>Java code (<a HREF="https://gitbox.apache.org/repos/asf?p=commons-rng.git">Apache Commons RNG implementations</a>)</h1>
    578 
    579       <h1>Documentation</h1>
    580       <p><ul>
    581 		<li>The <a href="http://vigna.di.unimi.it/papers.php#BlVSLPNG">paper</a> introducing <code>xoshiro</code>/<code>xoroshiro</code>.
    582 		<li>The <a href="http://vigna.di.unimi.it/papers.php#BlVNTHWD">paper</a> describing our <a href="hwd.php">test for Hamming-weight dependencies</a>.
    583 		<li>A <a href="http://vigna.di.unimi.it/papers.php#VigHTLGMT">paper</a> discussing the defects of the Mersenne Twister family of PRNGs.
    584 		<li>A <a href="http://vigna.di.unimi.it/papers.php#VigPORSPNG">paper</a> discussing the probability of overlap of random subsequences.
    585 		<li>A <a	href="http://vigna.di.unimi.it/papers.php#StVCESGMCPNG">paper</a> with new tables of multipliers for LCGs with power-of-two moduli.
    586 		<li>A <a href="http://vigna.di.unimi.it/papers.php#StVLXM">paper</a> presenting the family LXM of PRNGs.
    587       </ul>
    588 
    589 		<h1>Discussion</h1>
    590 
    591 		<p>There is a <a href="http://groups.google.com/group/prng">discussion group</a>
    592 		about this page. You can join or <a href="mailto:prng@googlegroups.com">send a message</a>.
    593 		<h1><a href="https://validator.w3.org/check/referer">This is valid HTML 4.01</a></h1>
    594 
    595     </div>
    596   </body>
    597 </html>