Specifying Character Encoding when Writing to a File

In .NET, string data is stored in memory as Unicode data encoded as UTF-16 (2 bytes per character, or 4 bytes for surrogate pairs).
When you persist string data out to a file, however, you must be aware of what encoding is being used.  In the example below, we use a StreamWriter to write string data to a file.  StreamWriter by default uses UTF-8 as  the encoding.
1
2
3
4
5
6
7
8
9
10
11
12
string s1 = "A";             // U+0041
string s2 = "\u00e9";        // U+00E9 accented e
string s3 = "\u0100";        // Capital A with bar
string s4 = "\U00020213";    // CJK ideograph (d840, de13 surrogate)
 
using (StreamWriter sw = new StreamWriter(@"C:\Users\Gaurav\Documents\<span class="skimlinks-unlinked">sometext.txt</span>"))
{
    sw.WriteLine(s1);
    sw.WriteLine(s2);
    sw.WriteLine(s3);
    sw.WriteLine(s4);
}
1002_001
We could also explicitly specify a UTF-16 encoding (Encoding.Unicode) when creating the StreamWriter object.
1
using (StreamWriter sw = new StreamWriter(@"C:\Users\Gaurav\Documents\<span class="skimlinks-unlinked">sometext.txt</span>", false, Encoding.Unicode))
1002_002