Directly using the PHP function substr to intercept Chinese characters may cause garbled characters. The main reason is that substr may "saw" a Chinese character in half. So let's see how to solve this problem.
I believe that everyone often uses interception of strings in their own programs, but often encounters the problem of garbled characters when intercepting Chinese strings. It is very troublesome. Next, we will introduce two methods to prevent garbled characters when intercepting Chinese strings.
First of all, a function written by yourself is convenient to use.
Use this function to intercept and there will be no garbled characters.
/** * 支持中文字符串截取 */ function msubstr($str, $start=0, $length, $charset="utf-8", $suffix=true){ switch($charset){ case 'utf-8':$char_len=3;break; case 'UTF8':$char_len=3;break; default:$char_len=2; } //小于指定长度,直接返回 if(strlen($str)<=($length*$char_len)){ return $str; } if(function_exists("mb_substr")){ $slice= mb_substr($str, $start, $length, $charset); }else if(function_exists('iconv_substr')){ $slice=iconv_substr($str,$start,$length,$charset); }else{ $re['utf-8'] = "/[\x01-\x7f]|[\xc2-\xdf][\x80-\xbf]|[\xe0-\xef][\x80-\xbf]{2}|[\xf0-\xff][\x80-\xbf]{3}/"; $re['gb2312'] = "/[\x01-\x7f]|[\xb0-\xf7][\xa0-\xfe]/"; $re['gbk'] = "/[\x01-\x7f]|[\x81-\xfe][\x40-\xfe]/"; $re['big5'] = "/[\x01-\x7f]|[\x81-\xfe]([\x40-\x7e]|\xa1-\xfe])/"; preg_match_all($re[$charset], $str, $match); $slice = join("",array_slice($match[0], $start, $length)); } if($suffix) return $slice; return $slice; }
The second is a built-in function in PHP mb_substr function
Specify the encoding format of the string to be intercepted, just It can effectively prevent garbled characters.
Description
string mb_substr ( string $str , int $start [, int $length [, string $encoding ]] ) <?php function substr_unicode($str, $s, $l = null) { return join("", array_slice( preg_split("//u", $str, -1, PREG_SPLIT_NO_EMPTY), $s, $l)); } $str = "Büyük"; $s = 0; // start from "0" (nth) char $l = 3; // get "3" chars echo substr($str, $s, $l) ."\n"; echo mb_substr($str, $s, $l) ."\n"; echo substr_unicode($str, $s, $l); ?>
Summary: The above is the entire content of this article, I hope it can be helpful to everyone learning helps.
Related recommendations:
Principle of PHP event mechanism
PHP operation session and database Method
PHP WeChat interface implements QR code generation class
##
The above is the detailed content of Briefly describe the method of intercepting Chinese characters in PHP to prevent garbled characters. For more information, please follow other related articles on the PHP Chinese website!