一直使用 UTF-8
P粉964682904
2023-08-27 16:00:18
<p>
我正在设置一个新服务器,并希望在我的 Web 应用程序中完全支持 UTF-8。我过去曾在现有服务器上尝试过此操作,但似乎总是最终不得不退回到 ISO-8859-1。</p>
<p>我到底需要在哪里设置编码/字符集?我知道我需要配置 Apache、MySQL 和 PHP 来执行此操作 - 是否有一些我可以遵循的标准检查表,或者也许可以对发生不匹配的位置进行故障排除?</p>
<p>这适用于运行 MySQL 5、PHP、5 和 Apache 2 的新 Linux 服务器。</p>
我想在 chazomaticus 的出色答案中添加一件事一个>:
也不要忘记 META 标记(像这样,或者 它的 HTML4 或 XHTML 版本):
这看起来微不足道,但 IE7 之前曾给我带来过问题。
我做的一切都是正确的;数据库、数据库连接和Content-Type HTTP标头都设置为UTF-8,在所有其他浏览器中都运行良好,但Internet Explorer仍然坚持使用“西欧”编码。
原来该页面缺少 META 标记。添加即可解决问题。
编辑:
W3C 实际上有一个相当大的专门讨论 I18N 的部分。他们有许多与此问题相关的文章 - 描述了 HTTP、(X)HTML 和 CSS 方面的内容:
他们建议同时使用 HTTP 标头和 HTML 元标记(或者在 XHTML 充当 XML 的情况下使用 XML 声明)。
数据存储:
Specify the
utf8mb4
character set on all tables and text columns in your database. This makes MySQL physically store and retrieve values encoded natively in UTF-8. Note that MySQL will implicitly useutf8mb4
encoding if autf8mb4_*
collation is specified (without any explicit character set).In older versions of MySQL (< 5.5.3), you'll unfortunately be forced to use simply
utf8
, which only supports a subset of Unicode characters. I wish I were kidding.数据访问:
In your application code (e.g. PHP), in whatever DB access method you use, you'll need to set the connection charset to
utf8mb4
. This way, MySQL does no conversion from its native UTF-8 when it hands data off to your application and vice versa.某些驱动程序提供自己的机制来配置连接字符集,该机制既更新其自身的内部状态,又通知 MySQL 连接上要使用的编码 - 这通常是首选方法。在 PHP 中:
If you're using the PDO abstraction layer with PHP ≥ 5.3.6, you can specify
charset
in the DSN:If you're using mysqli, you can call
set_charset()
:If you're stuck with plain mysql but happen to be running PHP ≥ 5.2.3, you can call
mysql_set_charset
.If the driver does not provide its own mechanism for setting the connection character set, you may have to issue a query to tell MySQL how your application expects data on the connection to be encoded:
SET NAMES 'utf8mb4'
.The same consideration regarding
utf8mb4
/utf8
applies as above.输出:
Content-Type: text/html; charset=utf-8
. You can achieve that either by settingdefault_charset
in php.ini (preferred), or manually usingheader()
function.json_encode()
, addJSON_UNESCAPED_UNICODE
as a second parameter.输入:
mb_check_encoding()
does the trick, but you have to use it religiously. There's really no way around this, as malicious clients can submit data in whatever encoding they want, and I haven't found a trick to get PHP to do this for you reliably.其他代码注意事项:
显然,您将提供的所有文件(PHP、HTML、JavaScript 等)都应使用有效的 UTF-8 进行编码。
You need to make sure that every time you process a UTF-8 string, you do so safely. This is, unfortunately, the hard part. You'll probably want to make extensive use of PHP's
mbstring
extension.PHP's built-in string operations are not by default UTF-8 safe. There are some things you can safely do with normal PHP string operations (like concatenation), but for most things you should use the equivalent
mbstring
function.要知道您在做什么(阅读:不要搞砸),您确实需要了解 UTF-8 以及它如何在尽可能最低的级别上工作。查看 utf8.com 中的任何链接,获取一些很好的资源,以了解您需要了解的所有内容。 p>