time-to-botec

Benchmark sampling in different programming languages
Log | Files | Refs | README

README.md (3911B)


      1 <!--
      2 
      3 @license Apache-2.0
      4 
      5 Copyright (c) 2018 The Stdlib Authors.
      6 
      7 Licensed under the Apache License, Version 2.0 (the "License");
      8 you may not use this file except in compliance with the License.
      9 You may obtain a copy of the License at
     10 
     11    http://www.apache.org/licenses/LICENSE-2.0
     12 
     13 Unless required by applicable law or agreed to in writing, software
     14 distributed under the License is distributed on an "AS IS" BASIS,
     15 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
     16 See the License for the specific language governing permissions and
     17 limitations under the License.
     18 
     19 -->
     20 
     21 # UTF-16 to UTF-8
     22 
     23 > Convert a [UTF-16][utf-16] encoded string to an array of integers using [UTF-8][utf-8] encoding.
     24 
     25 <!-- Section to include introductory text. Make sure to keep an empty line after the intro `section` element and another before the `/section` close. -->
     26 
     27 <section class="intro">
     28 
     29 </section>
     30 
     31 <!-- /.intro -->
     32 
     33 <!-- Package usage documentation. -->
     34 
     35 <section class="usage">
     36 
     37 ## Usage
     38 
     39 ```javascript
     40 var utf16ToUTF8Array = require( '@stdlib/string/utf16-to-utf8-array' );
     41 ```
     42 
     43 #### utf16ToUTF8Array( str )
     44 
     45 Converts a [UTF-16][utf-16] encoded string to an `array` of integers using [UTF-8][utf-8] encoding.
     46 
     47 ```javascript
     48 var out = utf16ToUTF8Array( '☃' );
     49 // returns [ 226, 152, 131 ]
     50 ```
     51 
     52 </section>
     53 
     54 <!-- /.usage -->
     55 
     56 <!-- Package usage notes. Make sure to keep an empty line after the `section` element and another before the `/section` close. -->
     57 
     58 <section class="notes">
     59 
     60 ## Notes
     61 
     62 -   [UTF-16][utf-16] encoding uses one 16-bit unit for non-surrogates (`U+0000` to `U+D7FF` and `U+E000` to `U+FFFF`).
     63 
     64 -   [UTF-16][utf-16] encoding uses two 16-bit units (surrogate pairs) for `U+10000` to `U+10FFFF` and encodes `U+10000-U+10FFFF` by subtracting `0x10000` from the code point, expressing the result as a 20-bit binary, and splitting the 20 bits of `0x0-0xFFFFF` as upper and lower 10-bits. The respective 10-bits are stored in two 16-bit words: a **high** and a **low** surrogate.
     65 
     66 -   [UTF-8][utf-8] is defined to encode code points in one to four bytes, depending on the number of significant bits in the numerical value of the code point. Encoding uses the following byte sequences:
     67 
     68     ```text
     69     0x00000000 - 0x0000007F:
     70         0xxxxxxx
     71 
     72     0x00000080 - 0x000007FF:
     73         110xxxxx 10xxxxxx
     74 
     75     0x00000800 - 0x0000FFFF:
     76         1110xxxx 10xxxxxx 10xxxxxx
     77 
     78     0x00010000 - 0x001FFFFF:
     79         11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
     80     ```
     81 
     82     where an `x` represents a code point bit. Only the shortest possible multi-byte sequence which can represent a code point is used.
     83 
     84 </section>
     85 
     86 <!-- /.notes -->
     87 
     88 <!-- Package usage examples. -->
     89 
     90 <section class="examples">
     91 
     92 ## Examples
     93 
     94 <!-- eslint no-undef: "error" -->
     95 
     96 ```javascript
     97 var utf16ToUTF8Array = require( '@stdlib/string/utf16-to-utf8-array' );
     98 
     99 var values;
    100 var out;
    101 var i;
    102 
    103 values = [
    104     'Ladies + Gentlemen',
    105     'An encoded string!',
    106     'Dogs, Cats & Mice',
    107     '☃',
    108     'æ',
    109     '𐐷'
    110 ];
    111 for ( i = 0; i < values.length; i++ ) {
    112     out = utf16ToUTF8Array( values[ i ] );
    113     console.log( '%s: %s', values[ i ], out.join( ',' ) );
    114 }
    115 ```
    116 
    117 </section>
    118 
    119 <!-- /.examples -->
    120 
    121 <!-- Section to include cited references. If references are included, add a horizontal rule *before* the section. Make sure to keep an empty line after the `section` element and another before the `/section` close. -->
    122 
    123 <section class="references">
    124 
    125 </section>
    126 
    127 <!-- /.references -->
    128 
    129 <!-- Section for related `stdlib` packages. Do not manually edit this section, as it is automatically populated. -->
    130 
    131 <section class="related">
    132 
    133 </section>
    134 
    135 <!-- /.related -->
    136 
    137 <!-- Section for all links. Make sure to keep an empty line after the `section` element and another before the `/section` close. -->
    138 
    139 <section class="links">
    140 
    141 [utf-8]: https://en.wikipedia.org/wiki/UTF-8
    142 
    143 [utf-16]: https://en.wikipedia.org/wiki/UTF-16
    144 
    145 </section>
    146 
    147 <!-- /.links -->