When incorporating "special" Unicode characters, they often appear distorted after being encoded into JSON:
echo json_encode(['foo' => '馬']); // Output: {"foo":"\u99ac"}
Understanding why this occurs is crucial.
JSON Encoding Standard
JSON encoding leverages the ECMAScript (formerly known as JavaScript) string literal formation (Section 7.8.4). It allows characters to be represented as hexadecimal numbers prefixed with "u", followed by four hexadecimal digits representing the code point:
"\u99ac"
This representation, identical to the string literal "馬", conveys the same Unicode character when parsed by a compliant JSON parser.
PHP's JSON Encoding Preference
PHP's json_encode function often encodes non-ASCII characters using "u...." escape sequences. While this is optional, it produces valid JSON.
Customizing Encoding
If desired, the JSON_UNESCAPED_UNICODE flag, introduced in PHP 5.4, allows for literal character encoding:
echo json_encode(['foo' => '馬'], JSON_UNESCAPED_UNICODE); // Output: {"foo":"馬"}
It's important to note that this customization is a preference rather than a necessity for transmitting Unicode characters in JSON.
The above is the detailed content of Why Do Special Unicode Characters Appear Distorted After JSON Encoding?. For more information, please follow other related articles on the PHP Chinese website!