Some Examples of UTF-16 and UTF-8 Encoding

Unicode maps characters into their corresponding code points, i.e. a numeric value that represents that character. A character encoding scheme then dictates how each code point is represented as a series of bits so that it can be stored in memory or on disk. UTF-16 and UTF-8 are the most commonly used encoding schemes for Unicode character data.

Below are some examples of how various characters would be encoded in UTF-16 and UTF-8.

Latin capital ‘A’, code point U+0041
- UTF-16: 2 bytes, 00 41 (hex)
- UTF-8: 1 byte, 41 (hex)
Latin lowercase ‘é’ with acute accent, code point U+00E9
- UTF-16: 2 bytes, 00 E9 (hex)
- UTF-8: 2 bytes, C3 A9 (hex) [110x xxxx 10xx xxxx]
Mongolian letter A, U+1820
- UTF-16: 2 bytes, 18 20 (hex)
- UTF-8: 3 bytes, E1 A0 A0 (hex) [1110 xxxx 10xx xxxx 10xx xxxx]
Ace of Spades playing card character, U+1F0A1
- UTF-16: 4 bytes, D8 3C DC A1
- UTF-8: 4 bytes, F0 9F 82 A1 [1111 0xxx 10xx xxxx 10xx xxxx 10xx xxxx]

Be a Fan

Hash OuT

Some Examples of UTF-16 and UTF-8 Encoding

Popular Posts

Subscribe Now

Total Pageviews

Is this Blog Useful?