Home Backend Development PHP Tutorial 分割gbk中文出现乱码的有关问题解决

分割gbk中文出现乱码的有关问题解决

Jun 13, 2016 pm 12:28 PM
array encoding explode list string

分割gbk中文出现乱码的问题解决

近日遇到一个神奇的字“弢(tao)”。

具体的过程是这样的:

<span style="color: #008080;">1</span> <span style="color: #800080;">$list</span> = <span style="color: #008080;">explode</span>('|', 'abc弢|bc'<span style="color: #000000;">);</span><span style="color: #008080;">2</span> <span style="color: #008080;">var_dump</span>(<span style="color: #800080;">$list</span>);
Copy after login

取得这个分割的结果。

和想象不同,结果居然是这样:

<span style="color: #0000ff;">array</span>(3<span style="color: #000000;">) {  [</span>0]=>  <span style="color: #0000ff;">string</span>(4) "<span style="color: #000000;">abc?  [1]=>  string(0) </span>""<span style="color: #000000;">  [2]=>  string(2) </span>"bc"<span style="color: #000000;">}</span>
Copy after login
Copy after login

出现了乱码,而且莫名其妙的出现了一个空元素。

究其原因,原来这个字“弢”的gbk编码是8f7c,而|的ASCII是7c,这样explode就把弢的第二ASCII作为|切割了。

既然是双字节的问题,我们用mbstring解决好了。

可惜,php并没有mb_explode这种函数,找了找,找到一个mb_split。

<span style="color: #0000ff;">array</span> mb_split ( <span style="color: #0000ff;">string</span> <span style="color: #800080;">$pattern</span> , <span style="color: #0000ff;">string</span> <span style="color: #800080;">$string</span> [, int <span style="color: #800080;">$limit</span> = -1 ] )
Copy after login

没有声明编码的地方。仔细一看,他是通过mb_regex_encoding声明编码的。

于是写出以下的代码:

<span style="color: #008080;">1</span> mb_regex_encoding('gbk'<span style="color: #000000;">);</span><span style="color: #008080;">2</span> <span style="color: #800080;">$list</span> = mb_split('\|', 'abc弢|bc'<span style="color: #000000;">);</span><span style="color: #008080;">3</span> <span style="color: #008080;">var_dump</span>(<span style="color: #800080;">$list</span>);
Copy after login

结果php报错,mb_regex_encoding不认识gbk,囧。

那就使用它认识的:

<span style="color: #008080;">1</span> mb_regex_encoding('gb2312'<span style="color: #000000;">);</span><span style="color: #008080;">2</span> <span style="color: #800080;">$list</span> = mb_split('\|', 'abc弢|bc'<span style="color: #000000;">);</span><span style="color: #008080;">3</span> <span style="color: #008080;">var_dump</span>(<span style="color: #800080;">$list</span>);
Copy after login

结果:

<span style="color: #0000ff;">array</span>(3<span style="color: #000000;">) {  [</span>0]=>  <span style="color: #0000ff;">string</span>(4) "<span style="color: #000000;">abc?  [1]=>  string(0) </span>""<span style="color: #000000;">  [2]=>  string(2) </span>"bc"<span style="color: #000000;">}</span>
Copy after login
Copy after login

发现,这种方法并没有什么用处。、

至于原因?“弢”这个字居然不在GB2312的编码集里面!!!!!但是有这个字的编码集(GBK, GB18030)这个函数都不支持!!!!!

既然这个不好用,也许万能的正则表达式是ok的。于是得到以下代码:

<span style="color: #008080;">1</span> <span style="color: #008080;">var_dump</span>(<span style="color: #008080;">preg_match_all</span>('/([^\|])*/', 'abc弢|bc', <span style="color: #800080;">$matches</span><span style="color: #000000;">));</span><span style="color: #008080;">2</span> <span style="color: #008080;">var_dump</span>(<span style="color: #800080;">$matches</span>);
Copy after login

结果:

int(2<span style="color: #000000;">)</span><span style="color: #0000ff;">array</span>(2<span style="color: #000000;">) {  [</span>0]=>  <span style="color: #0000ff;">array</span>(2<span style="color: #000000;">) {    [</span>0]=>    <span style="color: #0000ff;">string</span>(4) "<span style="color: #000000;">abc?    [1]=>    string(2) </span>"bc"<span style="color: #000000;">  }  [1]=>  array(2) {    [0]=>    string(1) </span>"?<span style="color: #000000;">    [</span>1]=>    <span style="color: #0000ff;">string</span>(1) "c"<span style="color: #000000;">  }}</span>
Copy after login

好吧,我想多了。

现在研究一下,如何用正则描述这个场景。

参考一下,鸟哥大神的博客:分割GBK中文遭遇乱码的解决。遗憾的是,正则能力比较low的我,还是想不出来合适的正则表达式(如果有想出这个正则表达式的大神们,希望可以告诉我)。

没办法,思来想去,只好用substr了:

<span style="color: #008080;"> 1</span> <span style="color: #0000ff;">function</span> mb_explode(<span style="color: #800080;">$delimiter</span>, <span style="color: #800080;">$string</span>, <span style="color: #800080;">$encoding</span> = <span style="color: #0000ff;">null</span><span style="color: #000000;">){</span><span style="color: #008080;"> 2</span>     <span style="color: #800080;">$list</span> = <span style="color: #0000ff;">array</span><span style="color: #000000;">();</span><span style="color: #008080;"> 3</span>     <span style="color: #008080;">is_null</span>(<span style="color: #800080;">$encoding</span>) && <span style="color: #800080;">$encoding</span> =<span style="color: #000000;"> mb_internal_encoding();</span><span style="color: #008080;"> 4</span>     <span style="color: #800080;">$len</span> = mb_strlen(<span style="color: #800080;">$delimiter</span>, <span style="color: #800080;">$encoding</span><span style="color: #000000;">);</span><span style="color: #008080;"> 5</span>     <span style="color: #0000ff;">while</span>(<span style="color: #0000ff;">false</span> !== (<span style="color: #800080;">$idx</span> = mb_strpos(<span style="color: #800080;">$string</span>, <span style="color: #800080;">$delimiter</span>, 0, <span style="color: #800080;">$encoding</span><span style="color: #000000;">))){</span><span style="color: #008080;"> 6</span>         <span style="color: #800080;">$list</span>[] = mb_substr(<span style="color: #800080;">$string</span>, 0, <span style="color: #800080;">$idx</span>, <span style="color: #800080;">$encoding</span><span style="color: #000000;">);</span><span style="color: #008080;"> 7</span>         <span style="color: #800080;">$string</span> = mb_substr(<span style="color: #800080;">$string</span>, <span style="color: #800080;">$idx</span> + <span style="color: #800080;">$len</span>, <span style="color: #0000ff;">null</span>, <span style="color: #800080;">$encoding</span><span style="color: #000000;">);</span><span style="color: #008080;"> 8</span> <span style="color: #000000;">    }   </span><span style="color: #008080;"> 9</span>     <span style="color: #800080;">$list</span>[] = <span style="color: #800080;">$string</span><span style="color: #000000;">;</span><span style="color: #008080;">10</span>     <span style="color: #0000ff;">return</span> <span style="color: #800080;">$list</span><span style="color: #000000;">; </span><span style="color: #008080;">11</span> } 
Copy after login

测试代码:

<span style="color: #008080;">1</span> <span style="color: #800080;">$a</span> = 'abc弢|bc'<span style="color: #000000;">;</span><span style="color: #008080;">2</span> <span style="color: #008080;">3</span> <span style="color: #008080;">var_dump</span>(mb_explode('|', <span style="color: #800080;">$a</span>, 'gbk'<span style="color: #000000;">));</span><span style="color: #008080;">4</span> <span style="color: #008080;">var_dump</span>(mb_explode('bc', <span style="color: #800080;">$a</span>, 'gbk'<span style="color: #000000;">));</span><span style="color: #008080;">5</span> <span style="color: #008080;">var_dump</span>(mb_explode('弢', <span style="color: #800080;">$a</span>, 'gbk'));
Copy after login

结果:

<span style="color: #0000ff;">array</span>(2<span style="color: #000000;">) {  [</span>0]=>  <span style="color: #0000ff;">string</span>(5) "abc弢"<span style="color: #000000;">  [</span>1]=>  <span style="color: #0000ff;">string</span>(2) "bc"<span style="color: #000000;">}</span><span style="color: #0000ff;">array</span>(3<span style="color: #000000;">) {  [</span>0]=>  <span style="color: #0000ff;">string</span>(1) "a"<span style="color: #000000;">  [</span>1]=>  <span style="color: #0000ff;">string</span>(3) "弢|"<span style="color: #000000;">  [</span>2]=>  <span style="color: #0000ff;">string</span>(0) ""<span style="color: #000000;">}</span><span style="color: #0000ff;">array</span>(2<span style="color: #000000;">) {  [</span>0]=>  <span style="color: #0000ff;">string</span>(3) "abc"<span style="color: #000000;">  [</span>1]=>  <span style="color: #0000ff;">string</span>(3) "|bc"<span style="color: #000000;">}</span>
Copy after login

这样就可以得到正确的结果了。

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Convert basic data types to strings using Java's String.valueOf() function Convert basic data types to strings using Java's String.valueOf() function Jul 24, 2023 pm 07:55 PM

Convert basic data types to strings using Java's String.valueOf() function In Java development, when we need to convert basic data types to strings, a common method is to use the valueOf() function of the String class. This function can accept parameters of basic data types and return the corresponding string representation. In this article, we will explore how to use the String.valueOf() function for basic data type conversions and provide some code examples to

How to convert char array to string How to convert char array to string Jun 09, 2023 am 10:04 AM

Method of converting char array to string: It can be achieved by assignment. Use {char a[]=" abc d\0efg ";string s=a;} syntax to let the char array directly assign a value to string, and execute the code to complete the conversion.

Use Java's String.replace() function to replace characters (strings) in a string Use Java's String.replace() function to replace characters (strings) in a string Jul 25, 2023 pm 05:16 PM

Replace characters (strings) in a string using Java's String.replace() function In Java, strings are immutable objects, which means that once a string object is created, its value cannot be modified. However, you may encounter situations where you need to replace certain characters or strings in a string. At this time, we can use the replace() method in Java's String class to implement string replacement. The replace() method of String class has two types:

2w words detailed explanation String, yyds 2w words detailed explanation String, yyds Aug 24, 2023 pm 03:56 PM

Hello everyone, today I will share with you the basic knowledge of Java: String. Needless to say the importance of the String class, it can be said to be the most used class in our back-end development, so it is necessary to talk about it.

How to implement Redis List operation in php How to implement Redis List operation in php May 26, 2023 am 11:51 AM

List operation //Insert a value from the head of the list. $ret=$redis->lPush('city','guangzhou');//Insert a value from the end of the list. $ret=$redis->rPush('city','guangzhou');//Get the elements in the specified range of the list. 0 represents the first element of the list, -1 represents the last element, and -2 represents the penultimate element. $ret=$redis->l

Golang function byte, rune and string type conversion skills Golang function byte, rune and string type conversion skills May 17, 2023 am 08:21 AM

In Golang programming, byte, rune and string types are very basic and common data types. They play an important role in processing data operations such as strings and file streams. When performing these data operations, we usually need to convert them to each other, which requires mastering some conversion skills. This article will introduce the byte, rune and string type conversion techniques of Golang functions, aiming to help readers better understand these data types and be able to apply them skillfully in programming practice.

Sort array using Array.Sort function in C# Sort array using Array.Sort function in C# Nov 18, 2023 am 10:37 AM

Title: Example of using the Array.Sort function to sort an array in C# Text: In C#, array is a commonly used data structure, and it is often necessary to sort the array. C# provides the Array class, which has the Sort method to conveniently sort arrays. This article will demonstrate how to use the Array.Sort function in C# to sort an array and provide specific code examples. First, we need to understand the basic usage of the Array.Sort function. Array.So

Use java's String.length() function to get the length of a string Use java's String.length() function to get the length of a string Jul 25, 2023 am 09:09 AM

Use Java's String.length() function to get the length of a string. In Java programming, string is a very common data type. We often need to get the length of a string, that is, the number of characters in the string. In Java, we can use the length() function of the String class to get the length of a string. Here is a simple example code: publicclassStringLengthExample{publ

See all articles