Character Encoding in JSON: Understanding Unicode Representation
Unicode characters can be encoded in JSON using various formats. One method, which is commonly used by PHP's json_encode function, is the "u" escape sequence. This format represents characters as hexadecimal code points, such as:
"foo": "\u99ac"
This escape sequence is valid JSON and will be interpreted correctly by compliant JSON parsers, resulting in the string "馬".
Why Escape Sequences are Preferred
By default, PHP's json_encode prefers to use escape sequences for non-ASCII characters. While this may not be aesthetically pleasing, it is perfectly valid and does not affect data integrity.
Benefits of Escape Sequences
Enabling Literal Characters
If you prefer to represent Unicode characters without escape sequences, you can specify the JSON_UNESCAPED_UNICODE flag when calling json_encode. This will cause the characters to be output as literal UTF-8:
"foo": "馬"
Conclusion
Both escape sequences and literal characters are valid ways to represent Unicode in JSON. The choice of which method to use depends on specific preferences and requirements.
The above is the detailed content of How Does JSON Handle Unicode Characters: Escape Sequences vs. Literal UTF-8?. For more information, please follow other related articles on the PHP Chinese website!