Web page encoding is translated into English as web page encoding, which is a library that specifies its specific character encoding format in web pages.
GBK is a standard that is compatible with GB2312 after expansion based on the national standard GB2312. The text encoding of GBK is represented by double bytes, that is, both Chinese and English characters are represented by double bytes. In order to distinguish Chinese characters, the highest bits are set to 1. GBK contains all Chinese characters and is a national encoding. It is less versatile than UTF8, but UTF8 occupies a larger database than GBK.
UTF-8: Unicode TransformationFormat-8bit, BOM is allowed, but BOM is usually not included. It is a multi-byte encoding used to solve international characters. It uses 8 bits (that is, one byte) for English and 24 bits (three bytes) for Chinese. UTF-8 contains characters that are used by all countries in the world. It is an international encoding and has strong versatility. UTF-8 encoded text can be displayed on browsers in various countries that support the UTF8 character set. If it is UTF8 encoding, Chinese can also be displayed on foreigners' English IE, and they do not need to download IE's Chinese language support package.
Although the UTF-8 version has good international compatibility, the Chinese version requires 50% more database storage space than the GBK/BIG5 version, so it is not recommended and can only be used by users with special requirements for international compatibility. To put it simply: For websites with more Chinese characters, it is appropriate to use GBK encoding to save database space. For websites with more English, it is appropriate to use UTF-8 to save database space.
How to convert GBK, GB2312, etc. to UTF8? Unicode encoding must be used to convert GBK, GB2312, etc. to UTF8: GBK, GB2312—Unicode—UTF8; UTF8—Unicode—GBK, GB2312. Using "Save As" in Windows Notepad, you can convert between GBK, Unicode, Unicode big endian and UTF-8 encoding methods.
How to make the browser correctly identify the web page encoding? Generally, there must be the following sentence in the web page: , indicating that the character set encoding of this web page is GB2312. (Or UTF-8)
Why does the page sometimes specify the encoding and sometimes appear garbled? This may be caused by the page declaration encoding being inconsistent with the encoding of the file itself. More often, the page is opened with the wrong encoding and then saved, or some FTP software is used to directly modify the file online, such as CuteFTP. Conversion errors occur due to incorrect software encoding configuration. Encoded. At this time, use Windows Notepad to open it and use "Save As" to save it as the corresponding encoding to solve the problem.
When using IE as a browser on a Windows operating system, this problem often occurs: when browsing a webpage encoded with UTF-8, the browser cannot automatically identify the encoding used for the page, even if the webpage has been The encoding format is declared: , which causes some pages containing Chinese UTF-8 encoding to produce blank output. If you are using Firefox or Sarafi browsers, this will not cause this problem. This is because when IE parses the web page encoding, it prioritizes the tags in the HTML, and then the information in the HTTP header, while the Mozilla series of browsers do the opposite.
Because UTF-8 uses 3 bytes to represent one character, while ordinary GB2312 or BIG5 uses two. When the page is output, due to the above reasons, when the browser parses and outputs the content of