Google's Sitemap service requires that all sitemaps published must be encoded in Unicode's UTF-8. Google doesn't even allow other Unicode encodings like UTF-16, let alone non-Unicode encodings like ISO-8859-1. Technically this means that Google is using a non-standard XML parser, as the XML Recommendation specifically requires that "all XML handlers must accept the UTF-8 and UTF-16 encodings of Unicode 3.1", but is this really a big deal? ? UTF-8 is available to everyone. Ubiquity is the first and most compelling reason to choose UTF-8. It can handle every script currently used in the world. Although there are still a few gaps, they are becoming less and less obvious and are gradually being filled in. Literals that are not included are usually not implemented in any other character set, and even if they are, they cannot be used in XML. In the best case, these scripts are passed through font borrowing to a single-byte character set like Latin-1. Real support for such rare scripts may first come from Unico
1. Details on encoding XML documents using UTF-8
Introduction: Google's Sitemap service requires that all sitemaps published must use Unicode UTF-8 encoding. Google doesn't even allow other Unicode encodings like UTF-16, let alone non-Unicode encodings like ISO-8859-1. Technically this means Google is using a non-standard XML parser, as the XML Recommendation specifically requires that "all XML handlers must accept the UTF-8 and UTF-16 encodings of Unicode 3.1", but is this really a big deal? ?
2. Details introduction to some things related to codepoint and UTF-16 in Java
Introduction: The relationship between Unicode and UTF-8/UTF-16/UTF-32 The relationship between Unicode and UTF-8/UTF-16/UTF-32 is the relationship between character set and encoding. The concept of character set actually includes two aspects, one is the set of characters and the other is the encoding scheme. A character set defines all the symbols it contains. A character set in a narrow sense does not include an encoding scheme. It just defines all the symbols that belong to this character set. But generally speaking, a character set doesn't just define a collection of characters, it also defines a binary encoding for each symbol. When we mention GB2312 or ASCII, it implies...
3. New features of java 8 Update 20 - String deduplication
Introduction: Strings take up a lot of memory in any application. In particular, char[] arrays containing individual UTF-16 characters contribute the most to JVM memory consumption - because each character takes up 2 bits. It is actually very common for 30% of the memory to be consumed by strings.
Introduction: include, header: PHP page uses include to introduce headerphp and there is a blank line above the header: This problem has been bothering me for a long time. This problem is solved here. The key There was a problem with the encoding of the code. The encoding format used in the header.php of my page is UTF-8 with BOM. Modify the code with BOM to no BOM, so that the blank line in the header disappears. UTF-8 BOM is also called UTF-8 signature. In fact, UTF-8 BOM has no effect on UFT-8. It is added to support UTF-16 and UTF-32. The meaning of BOM signature is to tell the editor the current file. Which code to use
Introduction: The efficacy and function of Ganoderma lucidum spore powder and how to consume it: The efficacy and role of Ganoderma lucidum spore powder and how to consume it Method 2 of displaying the web page normally in any character set (continued): before transferring to: coolcode.cn A few days ago, I wrote an article on how to display web pages normally in any character set. The introduction was very simple, that is, character sets other than the first 128 characters are represented by NCR. However, I did not introduce the specific conversion method because at the time I It feels too simple. But later I found someone asked this question, so I will explain it in detail here. The first step is to convert the string of the source character set into the UTF-16 character set. This step is because each character in the UTF-16 character set is two bytes, and it is easy to process later,
6. PHP removes BOM header code
Introduction: PHP removes BOM header code UTF-8 BOM is also called UTF-8 signature. In fact, UTF-8 BOM has no effect on UFT-8. It is added to support UTF-16 and UTF-32. The meaning of BOM signature is to tell the editor the current file. Which encoding should be used to make it easier for the editor to identify it? However, although the BOM is not displayed in the editor, it will produce output, just like an extra blank line. If it happens after you modify any PHP file: * Unable to log in or log out; * A blank line appears at the top of the page; * Page top out
7. I beg you to help me solve some doubts about how to obtain xml node data in php
##Introduction : I beg you to help me solve the small problem of getting xml node data in php. I’m so bad at it. I want to get the value of
Introduction: Single byte to wide byte This post was last edited by sevencolours24 on 2013-02-28 16:05:54 $msg=”China” Now I want to send this msg to another application to receive it. How to convert the msg into utf-16 encoded wide bytes so that the application can display it normally? I sent it directly now and found that it is single byte. -----
9. Is it feasible to convert utf16be encoding into utf8 in php
Introduction: php Is it feasible to convert Chinese utf16be encoding to utf8? The data of utf16be needs to be converted into utf8 data (it is normal to convert utf-8 Chinese directly into gbk, but the letters are not normal). Is there any way available? I checked online and couldn't find it. ------Solution----------------------$text = iconv('utf-16be', 'utf-8', $t
[Related Q&A recommendations]:
c++ Programming The question of ascll version or unicode version, which encoding is the unicode version
Questions about code points and code units of char and String in Java
The above is the detailed content of Problems and solutions about UTF-16. For more information, please follow other related articles on the PHP Chinese website!