README.md (3911B)
1 <!-- 2 3 @license Apache-2.0 4 5 Copyright (c) 2018 The Stdlib Authors. 6 7 Licensed under the Apache License, Version 2.0 (the "License"); 8 you may not use this file except in compliance with the License. 9 You may obtain a copy of the License at 10 11 http://www.apache.org/licenses/LICENSE-2.0 12 13 Unless required by applicable law or agreed to in writing, software 14 distributed under the License is distributed on an "AS IS" BASIS, 15 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 16 See the License for the specific language governing permissions and 17 limitations under the License. 18 19 --> 20 21 # UTF-16 to UTF-8 22 23 > Convert a [UTF-16][utf-16] encoded string to an array of integers using [UTF-8][utf-8] encoding. 24 25 <!-- Section to include introductory text. Make sure to keep an empty line after the intro `section` element and another before the `/section` close. --> 26 27 <section class="intro"> 28 29 </section> 30 31 <!-- /.intro --> 32 33 <!-- Package usage documentation. --> 34 35 <section class="usage"> 36 37 ## Usage 38 39 ```javascript 40 var utf16ToUTF8Array = require( '@stdlib/string/utf16-to-utf8-array' ); 41 ``` 42 43 #### utf16ToUTF8Array( str ) 44 45 Converts a [UTF-16][utf-16] encoded string to an `array` of integers using [UTF-8][utf-8] encoding. 46 47 ```javascript 48 var out = utf16ToUTF8Array( '☃' ); 49 // returns [ 226, 152, 131 ] 50 ``` 51 52 </section> 53 54 <!-- /.usage --> 55 56 <!-- Package usage notes. Make sure to keep an empty line after the `section` element and another before the `/section` close. --> 57 58 <section class="notes"> 59 60 ## Notes 61 62 - [UTF-16][utf-16] encoding uses one 16-bit unit for non-surrogates (`U+0000` to `U+D7FF` and `U+E000` to `U+FFFF`). 63 64 - [UTF-16][utf-16] encoding uses two 16-bit units (surrogate pairs) for `U+10000` to `U+10FFFF` and encodes `U+10000-U+10FFFF` by subtracting `0x10000` from the code point, expressing the result as a 20-bit binary, and splitting the 20 bits of `0x0-0xFFFFF` as upper and lower 10-bits. The respective 10-bits are stored in two 16-bit words: a **high** and a **low** surrogate. 65 66 - [UTF-8][utf-8] is defined to encode code points in one to four bytes, depending on the number of significant bits in the numerical value of the code point. Encoding uses the following byte sequences: 67 68 ```text 69 0x00000000 - 0x0000007F: 70 0xxxxxxx 71 72 0x00000080 - 0x000007FF: 73 110xxxxx 10xxxxxx 74 75 0x00000800 - 0x0000FFFF: 76 1110xxxx 10xxxxxx 10xxxxxx 77 78 0x00010000 - 0x001FFFFF: 79 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx 80 ``` 81 82 where an `x` represents a code point bit. Only the shortest possible multi-byte sequence which can represent a code point is used. 83 84 </section> 85 86 <!-- /.notes --> 87 88 <!-- Package usage examples. --> 89 90 <section class="examples"> 91 92 ## Examples 93 94 <!-- eslint no-undef: "error" --> 95 96 ```javascript 97 var utf16ToUTF8Array = require( '@stdlib/string/utf16-to-utf8-array' ); 98 99 var values; 100 var out; 101 var i; 102 103 values = [ 104 'Ladies + Gentlemen', 105 'An encoded string!', 106 'Dogs, Cats & Mice', 107 '☃', 108 'æ', 109 '𐐷' 110 ]; 111 for ( i = 0; i < values.length; i++ ) { 112 out = utf16ToUTF8Array( values[ i ] ); 113 console.log( '%s: %s', values[ i ], out.join( ',' ) ); 114 } 115 ``` 116 117 </section> 118 119 <!-- /.examples --> 120 121 <!-- Section to include cited references. If references are included, add a horizontal rule *before* the section. Make sure to keep an empty line after the `section` element and another before the `/section` close. --> 122 123 <section class="references"> 124 125 </section> 126 127 <!-- /.references --> 128 129 <!-- Section for related `stdlib` packages. Do not manually edit this section, as it is automatically populated. --> 130 131 <section class="related"> 132 133 </section> 134 135 <!-- /.related --> 136 137 <!-- Section for all links. Make sure to keep an empty line after the `section` element and another before the `/section` close. --> 138 139 <section class="links"> 140 141 [utf-8]: https://en.wikipedia.org/wiki/UTF-8 142 143 [utf-16]: https://en.wikipedia.org/wiki/UTF-16 144 145 </section> 146 147 <!-- /.links -->