In .NET, string data is stored in memory as Unicode data encoded as UTF-16 (2 bytes per character, or 4 bytes for surrogate pairs).
When you persist string data out to a file, however, you must be aware of what encoding is being used. In the example below, we use a StreamWriter to write string data to a file. StreamWriter by default uses UTF-8 as the encoding.
1 2 3 4 5 6 7 8 9 10 11 12 | string s1 = "A" ; // U+0041 string s2 = "\u00e9" ; // U+00E9 accented e string s3 = "\u0100" ; // Capital A with bar string s4 = "\U00020213" ; // CJK ideograph (d840, de13 surrogate) using (StreamWriter sw = new StreamWriter( @"C:\Users\Gaurav\Documents\<span class=" skimlinks-unlinked ">sometext.txt</span>" )) { sw.WriteLine(s1); sw.WriteLine(s2); sw.WriteLine(s3); sw.WriteLine(s4); } |
We could also explicitly specify a UTF-16 encoding (Encoding.Unicode) when creating the StreamWriter object.
1 | using (StreamWriter sw = new StreamWriter( @"C:\Users\Gaurav\Documents\<span class=" skimlinks-unlinked ">sometext.txt</span>" , false , Encoding.Unicode)) |