What is Java's Internal Representation for String: Modified UTF-8 or UTF-16?
Java utilizes UTF-16 for its internal text representation, as stated by the Oracle documentation. This representation applies to various data structures and classes that store character sequences within the Java platform, such as String and StringBuilder. A 16-bit unsigned integer (char) in Java can represent a Unicode code point or code units of UTF-16.
However, Java also employs a non-standard modification of UTF-8 for string serialization. This means that serialized strings are stored in UTF-8 format by default.
For storage in memory, Java uses 2 bytes for a char data type. Note that code points may require one or two char instances, resulting in 2 or 4 bytes of storage space, respectively.
The above is the detailed content of How does Java internally represent Strings: UTF-8 or UTF-16?. For more information, please follow other related articles on the PHP Chinese website!