You can use the string.Length property to get the length (number of characters) of a string. This only works, however, for Unicode code points that are no larger than U+FFFF. This set of code points is known as the Basic Multilingual Plane (BMP).
Unicode code points outside of the BMP are represented in UTF-16 using 4 byte surrogate pairs, rather than using 2 bytes.
To correctly count the number of characters in a string that may contain code points higher than U+FFFF, you can use the StringInfo class (from System.Globalization).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | // 3 Latin (ASCII) characters string simple = "abc" ; // 3 character string where one character // is a surrogate pair string containsSurrogatePair = "A𠈓C" ; // Length=3 (correct) Console.WriteLine( string .Format( "Length 1 = {0}" , simple.Length)); // Length=4 (not quite correct) Console.WriteLine( string .Format( "Length 2 = {0}" , containsSurrogatePair.Length)); // Better, reports Length=3 StringInfo si = new StringInfo(containsSurrogatePair); Console.WriteLine( string .Format( "Length 3 = {0}" , si.LengthInTextElements)); |