Handling UTF-8 Strings in PHP: json_encode and JSON_UNESCAPED_UNICODE
In PHP scripts involving multilingual content, the json_encode function's behavior can raise questions. It's observed that Unicode characters frequently end up as hexadecimal entities.
Example:
Input: echo $text;
Output: "База данни грешка."
Input: json_encode($text);
Output: "u0411u0430u0437u0430 u0434u0430u043du043du0438 u0433u0440u0435u0448u043au0430."
Understanding the Conversion
By default, json_encode encodes Unicode strings as hexadecimal entities. This is a common practice in JSON to ensure compatibility with older systems that may not support Unicode.
Solution: JSON_UNESCAPED_UNICODE
Introduced in PHP 5.4.0, the JSON_UNESCAPED_UNICODE flag allows you to bypass this conversion. When specified, it instructs json_encode to output UTF-8 characters directly.
Usage:
<code class="php">json_encode($text, JSON_UNESCAPED_UNICODE);</code>
With this flag, the output will maintain its original Unicode character encoding:
<code class="php">\u0411\u0430\u0437\u0430 \u0434\u0430\u043d\u043d\u0438 \u0433\u0440\u0435\u0448\u043a\u0430.</code>
By using the JSON_UNESCAPED_UNICODE flag, you can preserve the original Unicode characters in your JSON output. This is especially useful when working with multilingual content or integrating with systems that require true Unicode support.
The above is the detailed content of How can I prevent Unicode characters from being encoded as hexadecimal entities when using json_encode in PHP?. For more information, please follow other related articles on the PHP Chinese website!