Solution to garbled characters encountered when splitting GBK Chinese_PHP tutorial

WBOY
Release: 2016-07-21 14:59:54
Original
1181 people have browsed it

A string similar to the following (GBK), explode cannot get the correct result:

1.$result = explode("|", "Teng Huatao|Haiqing"); The reason is that for the character "韬" (pronounced tao, it doesn't matter if you don't know it, and I don't know it either), because of its GBK encoding The value is: 8f7c. Unfortunately, the ASCII value of “|” is also 7c.

There are many such problems: Because the encoding range of GBK encoding is: 0×8140-0xfefe, so, in theory, any word with a low byte of 7c will have this problem, such as:

1.倈(827c), billion(837c), 秧(b17c), 鴴(e57c)....etc. For such a situation,

1. First, we can use transcoding to utf8, then explode, and then convert back. This is a more troublesome method.
2. Second, we can use regular expressions to replace "separate" with "match out" Out ":
3.preg_match_all("/([/x81-/xfe][/x40-/xfe])+/", $gbk_str, $matches);//Written encoding like this, $matches contains 0 The array corresponding to the number index is the array of result words..

www.bkjia.comtruehttp: //www.bkjia.com/PHPjc/328124.htmlTechArticleA string similar to the following (GBK), explode cannot get the correct result: 1.$result = explode("| ", "Teng Huatao|Haiqing"); The reason is that for the word "韬" (pronounced tao), it doesn't matter if you don't know it, and I don't know it either...
source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template