Solution to the problem of intercepting garbled characters in PHP Chinese string_PHP tutorial

WBOY
Release: 2016-07-13 10:57:05
Original
1043 people have browsed it

The problem of Chinese interception and garbled characters is generally caused by mixing Chinese characters. If you intercept English, there will be no problem, but Chinese will. The main reason is: the string encoding is UTF-8, and one Chinese character occupies Three bytes and the string encoding is GB2312, one Chinese character occupies two bytes. Let me first look at examples.

The string encoding is GB2312, one Chinese character occupies two bytes:

The code is as follows Copy code
 代码如下 复制代码

public static function chinesesubstr($str, $start, $len) { // $str指字符串,$start指字符串的起始位置,$len指字符串长度
        $strlen = $start + $len; // 用$strlen存储字符串的总长度,即从字符串的起始位置到字符串的总长度
        for($i = $start; $i < $strlen;) {
if (ord ( substr ( $str, $i, 1 ) ) > 0xa0) { // 如果字符串中首个字节的ASCII序数值大于0xa0,则表示汉字
                $tmpstr .= substr ( $str, $i, 2 ); // 每次取出两位字符赋给变量$tmpstr,即等于一个汉字
                $i=$i+2; // 变量自加2
            } else{
                $tmpstr .= substr ( $str, $i, 1 ); // 如果不是汉字,则每次取出一位字符赋给变量$tmpstr
                $i++;
            }
        }
        return $tmpstr; // 返回字符串
    }

public static function chinesesubstr($str, $start, $len) { // $str refers to the string, $start refers to the starting position of the string, $len refers to the length of the string
$strlen = $start + $len; // Use $strlen to store the total length of the string, that is, from the starting position of the string to the total length of the string
for($i = $start; $i < $strlen;) {
If (ord ( substr ( $str, $i, 1 ) ) > 0xa0) { // If the ASCII ordinal value of the first byte in the string is greater than 0xa0, it means Chinese characters
                      $tmpstr .= substr ( $str, $i, 2 ); // Take out two characters each time and assign them to the variable $tmpstr, which is equal to one Chinese character
$ I = $ i+2; // Variables from 2
               } else{
                      $tmpstr .= substr ( $str, $i, 1 ); // If it is not a Chinese character, take out one character at a time and assign it to the variable $tmpstr
                   $i++;
             }
         }
           return $tmpstr; // Return string
}

The string encoding is UTF-8, and one Chinese character occupies three bytes:
 代码如下 复制代码


public static function chinesesubstr($str, $start, $len) { // $str指字符串,$start指字符串的起始位置,$len指字符串长度
        $strlen = $start + $len; // 用$strlen存储字符串的总长度,即从字符串的起始位置到字符串的总长度
        for($i = $start; $i < $strlen;) {
if (ord ( substr ( $str, $i, 1 ) ) > 0xa0) { // 如果字符串中首个字节的ASCII序数值大于0xa0,则表示汉字
                $tmpstr .= substr ( $str, $i, 3 ); // 每次取出三位字符赋给变量$tmpstr,即等于一个汉字
                $i=$i+3; // 变量自加3
            } else{
                $tmpstr .= substr ( $str, $i, 1 ); // 如果不是汉字,则每次取出一位字符赋给变量$tmpstr
                $i++;
            }
        }
        return $tmpstr; // 返回字符串
    }

The code is as follows Copy code

public static function chinesesubstr($str, $start, $len) { // $str refers to the string, $start refers to the starting position of the string, $len refers to the length of the string
$strlen = $start + $len; // Use $strlen to store the total length of the string, that is, from the starting position of the string to the total length of the string
for($i = $start; $i < $strlen;) {
If (ord ( substr ( $str, $i, 1 ) ) > 0xa0) { // If the ASCII ordinal value of the first byte in the string is greater than 0xa0, it means Chinese characters
                    $tmpstr .= substr ( $str, $i, 3 ); // Each time three characters are taken out and assigned to the variable $tmpstr, which is equal to one Chinese character
$ I = $ i+3; // Variables from 3
               } else{
                     $tmpstr .= substr ( $str, $i, 1 ); // If it is not a Chinese character, take out one character at a time and assign it to the variable $tmpstr
                   $i++;
             }
         }
           return $tmpstr; // Return string
}

Although this problem has been solved above, we must pay attention to the encoding problem, which is relatively troublesome. Here is a solution that works no matter what encoding is used.

The code is as follows
 代码如下 复制代码

/**
 * Utf-8、gb2312都支持的汉字截取函数
 * cut_str(字符串, 截取长度, 开始长度, 编码);
 * 编码默认为 utf-8
 * 开始长度默认为 0
 */
function cut_str($string, $sublen, $start = 0, $code = 'UTF-8')
{
    if($code == 'UTF-8')
    {
        $pa = "/[x01-x7f]|[xc2-xdf][x80-xbf]|xe0[xa0-xbf][x80-xbf]|[xe1-xef][x80-xbf][x80-xbf]|xf0[x90-xbf][x80-xbf][x80-xbf]|[xf1-xf7][x80-xbf][x80-xbf][x80-xbf]/";
        preg_match_all($pa, $string, $t_string);
        if(count($t_string[0]) - $start > $sublen) return join('', array_slice($t_string[0], $start, $sublen))."…";
        return join('', array_slice($t_string[0], $start, $sublen));
    }
    else
    {
        $start = $start*2;
        $sublen = $sublen*2;
        $strlen = strlen($string);
        $tmpstr = '';
        for($i=0; $i< $strlen; $i++)
{
if($i>=$start && $i< ($start+$sublen))
{
if(ord(substr($string, $i, 1))>129)
                {
                    $tmpstr.= substr($string, $i, 2);
                }
                else
                {
                    $tmpstr.= substr($string, $i, 1);
                }
            }
            if(ord(substr($string, $i, 1))>129) $i++;
        }
        if(strlen($tmpstr)< $strlen ) $tmpstr.= "…";
return $tmpstr;
}
}

Copy code
/**
* Chinese character interception function supported by Utf-8 and gb2312
* cut_str(string, cut length, starting length, encoding);
* The encoding defaults to utf-8
* The default starting length is 0
​*/
function cut_str($string, $sublen, $start = 0, $code = 'UTF-8')
{
If($code == 'UTF-8')
{
$pa = "/[x01-x7f]|[xc2-xdf][x80-xbf]|xe0[xa0-xbf][x80-xbf]|[xe1-xef][x80-xbf][x80-xbf]| xf0[x90-xbf][x80-xbf][x80-xbf]|[xf1-xf7][x80-xbf][x80-xbf][x80-xbf]/";
Preg_match_all($pa, $string, $t_string);
If(count($t_string[0]) - $start > $sublen) return join('', array_slice($t_string[0], $start, $sublen))."…";
           return join('', array_slice($t_string[0], $start, $sublen));
}
else
{
          $start = $start*2;
         $sublen = $sublen*2;
           $strlen = strlen($string);
          $tmpstr = '';
for($i=0; $i< $strlen; $i++)
                                  {
If($i>=$start && $i< ($start+$sublen))
                                                        {
If(ord(substr($string, $i, 1))>129)
                                                                                                 $tmpstr.= substr($string, $i, 2);
                   }
                                                      else                                                                                                   $tmpstr.= substr($string, $i, 1);
                   }
             }
If(ord(substr($string, $i, 1))>129) $i++;
         }
If(strlen($tmpstr)< $strlen ) $tmpstr.= "…";
          return $tmpstr;
}
}

http://www.bkjia.com/PHPjc/632122.htmlwww.bkjia.comtrue
http: //www.bkjia.com/PHPjc/632122.html
TechArticle
The problem of Chinese interception and garbled characters is usually caused by mixing Chinese and English. If you intercept English, there will be no problem. , there will be in Chinese, the main reason is: the string encoding is UTF-8, a Chinese...
source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template