Original address: Click to open the link
I saw a blog post on the homepage of Yuanzi today - a brief chat about Unicode and UTF-8, from which I learned that UTF-8 is a part of Unicode An implementation method:
Unicode only stipulates a unified binary number for every character in the world, and does not specify how the program should store and parse it.
It can be said that UTF-8 is one of the implementation methods of Unicode...
When recording this harvest in flash memory, @飞鸟_Asuka mentioned a good one in the reply Question: "So why are unicode and utf8 two separate options when choosing the encoding method?"
In C#, System.Text.Encoding.Unicode and System.Text.Encoding.UTF8 are respectively There are 2 encoding methods. If UTF-8 is an implementation method of Unicode, then why is Encoding.Unicode used as an encoding method in parallel with UTF8 in C#?
Later I found the answer on stackoverflow:
Windows handles so-called "Unicode" strings as UTF-16 strings, while most UNIXes default to UTF-8 these days.
It turns out that the default Unicode implementation in Windows is UTF-16, so Encoding.Unicode in C# is UTF-16.
The comment of System.Text.Encoding.Unicode also proves this:
// // Summary: // Gets an encoding for the UTF-16 format using the little endian byte order. // // Returns: // An encoding for the UTF-16 format using the little endian byte order.public static Encoding Unicode { get; }