• Lexical
Show / Hide Table of Contents
  • Lexical.FileSystem
    • Introduction
    • Abstractions
      • IFileSystem
        • IFileSystem
        • IFileSystemBrowse
        • IFileSystemCreateDirectory
        • IFileSystemDelete
        • IFileSystemFileAttribute
        • IFileSystemMount
        • IFileSystemMove
        • IFileSystemObserve
        • IFileSystemOpen
      • IEvent
      • IEntry
      • IOption
      • IToken
    • FileSystem
    • VirtualFileSystem
    • MemoryFileSystem
    • EmbeddedFileSystem
    • HttpFileSystem
    • Decoration
    • IFileProvider
    • Utility
      • DisposeList
      • FileScanner
      • VisitTree
      • File Operation
  • Lexical.FileProvider
    • Introduction
    • Package
    • Package.Abstractions
    • Root
    • Zip
    • Dll
    • SharpCompress
    • SharpZipLib
    • FileScanner
    • Utils
  • Lexical.Localization
    • Introduction
    • Tutorial
    • Asset
      • IAsset
      • IStringAsset
    • Line
      • ILine
      • ILineFactory
      • ILineRoot
      • ILineFormat
      • ILineLogger
      • LineComparer
    • File
      • ILineReader
      • ILineWriter
      • Ini
      • Json
      • Xml
      • Resx
      • Resources
    • Miscellaneous
      • Plurality
      • ICulturePolicy
      • IStringFormat
      • Dependency Injection
    • Practices
      • Class Library
      • Class Library DI
      • Class Library DI opt.
  • Lexical.Utilities
    • Introduction
    • UnicodeString
    • FileScanner
    • Permutation
    • Tuples
    • StructList

UnicodeString

UnicodeString addresses issues that come when handing strings with different encodings.

  • Website
  • Github
  • Nuget

Construct

UnicodeString can be constructed from

  • IEnumerable<byte>, byte[]
  • IEnumerable<char>, char[], String
  • IEnumerable<int>., int[]

UnicodeString is a wrapper that keeps reference to the source object that it was constructed from.

// Construct from String, IEnumerable<byte>, IEnumerable<char>, IEnumerable<int>
UnicodeString str = new UnicodeString("European castle \uD83C\uDFF0");
UnicodeString str8 = new UnicodeString(new byte[] { 69, 117, 114, 111, 112, 101, 97, 110, 32, 99, 97, 115, 116, 108, 101, 32, 240, 159, 143, 176 });
UnicodeString str16 = new UnicodeString(new char[] { 'E', 'u', 'r', 'o', 'p', 'e', 'a', 'n', ' ', 'c', 'a', 's', 't', 'l', 'e', ' ', '\uD83C', '\uDFF0' });
UnicodeString str32 = new UnicodeString(new int[] { 69, 117, 114, 111, 112, 101, 97, 110, 32, 99, 97, 115, 116, 108, 101, 32, 127984 });

Uniform use

Regardless of what source object is, as long as the encoding describes same unicode codepoints, UnicodeString describes the same uniform content.

Console.WriteLine($"Src={str8.Source.GetType().Name},  Str=\"{str8}\", Length={str8.Length}, Hashcode={str8.GetHashCode()}");
Console.WriteLine($"Src={str16.Source.GetType().Name},  Str=\"{str16}\", Length={str16.Length}, Hashcode={str16.GetHashCode()}");
Console.WriteLine($"Src={str32.Source.GetType().Name}, Str=\"{str32}\", Length={str32.Length}, Hashcode={str32.GetHashCode()}");

Encoding

Stack allocated IEnumerables and IEnumerators to UTF8, UTF16 and UTF32 encodings can be used.

// Encode to utf-8, utf-16, utf-32 codepoints
foreach (byte codepoint in str) { }
foreach (char codepoint in str) { }
foreach (int codepoint in str) { }

// Get stack allocated enumerator encoders
for (UTF8Enumerator enumr = str8.GetEnumeratorUTF8(); enumr.MoveNext();) { }
for (UTF16Enumerator enumr = str8.GetEnumeratorUTF16(); enumr.MoveNext();) { }
for (UTF32Enumerator enumr = str8.GetEnumerator(); enumr.MoveNext();) { }

// Length to different encodings can be calculated
int utf8length = str8.UTF8Length;
int utf16length = str8.UTF16Length;
int utf32length = str8.Length;

And converted to arrays of different encodings.

// Convert to utf 8/16/32 codepoints
Console.WriteLine($"Utf8 [ {String.Join(", ", str.ToUtf8Array())} ]");
Console.WriteLine($"Utf16 [ {String.Join(", ", str.ToUtf16Array())} ]");
Console.WriteLine($"Utf32 [ {String.Join(", ", str.ToArray())} ]");

Hash-Equals

UnicodeStrings can be compared for unicode equality.

// Compare for unicode equality and hashcode
var equals1 = str8.Equals(str32);
var equals2 = str16.Equals(str8);

var hashcode_equal1 = str8.GetHashCode() == str16.GetHashCode();
var hashcode_equal2 = str16.GetHashCode() == str8.GetHashCode();

Assign to IList

UnicodeString can be assigned to

  • IEnumerable<byte>, IList<byte>
  • IEnumerable<char>, IList<char>
  • IEnumerable<int>, IList<int>
// Assign UTF8 to ILists of various encodings
IList<byte> utf8_to_utf8 = str8;
IList<char> utf8_to_utf16 = str8;
IList<int> utf8_to_utf32 = str8;

// Assign UTF16 to ILists of various encodings
IList<byte> utf16_to_utf8 = str16;
IList<char> utf16_to_utf16 = str16;
IList<int> utf16_to_utf32 = str16;

// Assign UTF32 to ILists of various encodings
IList<byte> utf32_to_utf8 = str32;
IList<char> utf32_to_utf16 = str32;
IList<int> utf32_to_utf32 = str32;
Back to top Copyright © 2015-2020 Toni Kalajainen