JSON Encoding and Unicode Character Display
When encoding strings containing "special" Unicode characters using PHP's json_encode function, they often appear as garbled hexadecimally encoded strings. For instance:
json_encode(['foo' => '馬']); // {"foo":"\u99ac"}
Understanding JSON String Encoding
In JSON, string literals can be represented as Unicode code points escaped with a backslash ("u"). This allows for the encoding of any character, regardless of its encoding in the originating character set. Thus, the above JSON string is perfectly valid and represents the same character as its unencoded counterpart ("馬").
PHP's Default Encoding Preference
PHP's json_encode function prioritizes the use of "u" escape sequences for encoding non-ASCII characters. This is compliant with the JSON standard and ensures the portability of JSON data. If desired, one can disable this preference by specifying the JSON_UNESCAPED_UNICODE flag:
json_encode(['foo' => '馬'], JSON_UNESCAPED_UNICODE); // {"foo":"馬"}
Unicode Representation in JSON
It's important to note that both escaped and unencoded Unicode characters have the same meaning and value within JSON. The choice of encoding depends on the specific requirements of the data structure. However, the escaped form is generally preferred for interoperability and compatibility with different platforms and applications.
The above is the detailed content of How Does PHP's `json_encode` Handle Unicode Characters and How Can I Control the Output?. For more information, please follow other related articles on the PHP Chinese website!