Always use UTF-8 encoding
P粉548512637
2023-07-24 19:47:11
<p>
I'm setting up a new server and want full UTF-8 support in my web application. I've tried this before on existing servers, but always seemed to have to fall back to ISO-8859-1. <br />Where do I need to set the encoding/charset? I know I need to configure Apache, MySQL and PHP to achieve this. Is there a standard checklist I can refer to or troubleshoot mismatches? <br />This is a new Linux server running MySQL 5, PHP 5 and Apache 2. <br /></p><p><br /></p>
I would like to add to chazomaticus' excellent answer:
Also don't forget the META tags (like this, or the HTML4 or XHTML versions):
This may seem trivial, but IE7 has given me problems before.
I'm doing everything correctly; the database, database connection, and Content-Type HTTP headers are all set to UTF-8 and work fine in all other browsers, but Internet Explorer still insists on using "Western Europe "coding.
It turns out that the page is missing the META tag. After adding it the problem was solved.
Edit:
The W3C actually has a sizeable section dedicated to internationalization (I18N) issues. They have a number of articles related to this issue, covering HTTP, (X)HTML, and CSS:
They recommend using both HTTP headers and HTML meta tags (or using XML declarations in XHTML provided as XML).
data storage:
Specify the utf8mb4 character set on all tables and text columns in the database. This way, MySQL will physically store and retrieve the value in its native encoding of UTF-8. Note that if utf8mb4_* collations are specified (without any explicit character set), MySQL will implicitly use utf8mb4 encoding.
In older versions of MySQL (
data access:
In your application code (e.g. PHP), no matter what database access method you use, you need to set the connection character set to utf8mb4. This way, when MySQL passes the data to your application, it doesn't do any conversion from its native UTF-8 and vice versa.
Some drivers provide their own mechanism for configuring the connection character set, which both updates its own internal state and informs MySQL of the encoding to use on the connection - this is usually the preferred approach. In PHP:
If you are using the PDO abstraction layer for PHP ≥ 5.3.6, you can specify the character set in the DSN:
If you're using mysqli, you can call set_charset():
If you can only use normal mysql functions, but are running PHP ≥ 5.2.3, you can call the mysql_set_charset method.
If the driver does not provide its own mechanism to set the connection character set, you may need to issue a query to tell MySQL how your application wants the data on the connection to be encoded: SET NAMES 'utf8mb4'.
The same considerations as above apply to utf8mb4/utf8.
Output:
Input:
Other code notes:
Obviously, all files you provide (PHP, HTML, JavaScript, etc.) should be encoded in valid UTF-8.
You need to make sure it's safe every time you handle UTF-8 strings. Unfortunately, this is the hardest part. You may need to make extensive use of PHP's mbstring extension.
PHP's built-in string operations do not support UTF-8 by default. There are some normal PHP string operations you can safely use (such as concatenation), but for most operations you should use the equivalent mbstring functions.
In order to know what you're doing (i.e. not screw up), you really need to understand UTF-8 and how it works at the lowest level. Check out any of the links on utf8.com which provide some great resources to learn everything you need to know.