PHP determines the string length strlen() and mb_strlen() functions-PHP Tutorial-php.cn

PHP determines the string length strlen() and mb_strlen() functions

巴扎黑

Release： 2016-11-09 14:38:49

Original

1670 people have browsed it

strlen()

PHP strlen() function

Definition and usage

strlen() function returns the length of a string.

Syntax

strlen(string)

Parameters: string
Description: Required. Specifies the string to check.

The code is as follows

<?php 
$str=‘中文a字1符‘; 
echo strlen($str); 
echo ‘<br />‘; 
echo mb_strlen($str,‘UTF8‘); 
//输出结果 
//14 
//6 
?>

Copy after login

Result analysis: When calculating strlen, a UTF8 Chinese character is treated as 3 lengths, so the length of "Chinese a character 1 character" is 3*4+2=14
When calculating mb_strlen, select If the internal code is UTF8, a Chinese character will be calculated as a length of 1, so the length of "Chinese a character 1 character" is 6

mb_strlen() function

It should be noted that mb_strlen is not a PHP core function , before use, you need to make sure that php_mbstring.dll is loaded in php.ini, that is, make sure that the line "extension=php_mbstring.dll" exists and is not commented out, otherwise the problem of undefined functions will occur.

The code is as follows

<?php 
$str=‘中文a字1符‘; 
//计算如下 
echo (strlen($str) + mb_strlen($str,‘UTF8‘)) / 2; 
echo 
//输出结果 
//10 
?>

Copy after login

The strlen($str) value of "Chinese a character 1 character" is 14, and the mb_strlen($str) value is 6. Then it can be calculated that the placeholder of "Chinese a character 1 character" is 10.

Explain the difference between the two

The code is as follows

<?php
//测试时文件的编码方式要是UTF8
$str=&#39;中文a字1符&#39;;
echo strlen($str).&#39;<br>&#39;;//14
echo mb_strlen($str,&#39;utf8&#39;).&#39;<br>&#39;;//6
echo mb_strlen($str,&#39;gbk&#39;).&#39;<br>&#39;;//8
echo mb_strlen($str,&#39;gb2312&#39;).&#39;<br>&#39;;//10
?>

Copy after login

Result analysis: When calculating strlen, a UTF8 Chinese character is treated as 3 lengths, so "Chinese a character 1 character" The length is 3*4+2=14. When calculating mb_strlen

, if the internal code is selected as UTF8, a Chinese character will be calculated as a length of 1, so the length of "Chinese a character 1 character" is 6.

Although the above function can simply solve some problems of mixing Chinese and English, it cannot be used in actual practice. Let me introduce other better solutions to my friends

The implementation code for PHP to get the length of mixed Chinese and English strings is as follows, 1 Chinese = 1 digit, 2 English = 1 digit, you can modify it yourself

The code is as follows

/*** PHP获取字符串中英文混合长度 * @param $str string 字符串* @param $$charset string 编码* @return 返回长度，1中文=1位，2英文=1位*/function strLength($str,$charset=&#39;utf-8&#39;){if($charset==&#39;utf-8&#39;) $str = iconv(&#39;utf-8&#39;,&#39;gb2312&#39;,$str);$num = strlen($str);$cnNum = 0;for($i=0;$i<$num;$i++){if(ord(substr($str,$i+1,1))>127){$cnNum++;$i++;}}$enNum = $num-($cnNum*2);$number = ($enNum/2)+$cnNum;return ceil($number);}
//测试输出长度都为15$str1 = &#39;测试测试测试测试测试测试测试测&#39;;$str2 = &#39;aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa&#39;;$str3 = &#39;aa测试aa测试aa测试aa测试aaaaaa&#39;;echo strLength($str1,&#39;gb2312&#39;);echo strLength($str2,&#39;gb2312&#39;);echo strLength($str3,&#39;gb2312&#39;);

Copy after login

Intercept string function

UTF8 encoding, in In UTF8, one Chinese character occupies 3 bytes

The code is as follows

function msubstr($str, $start, $len) {
 $tmpstr = "";
 $strlen = $start + $len;
 for($i = 0; $i < $strlen; $i++){
  if(ord(substr($str, $i, 1)) > 127){
   $tmpstr.=substr($str, $i, 3);
   $i+=2;
  }else
   $tmpstr.= substr($str, $i, 1);
 }
 return $tmpstr;
}
echo msubstr("一二三天下致公english",0,10);

Copy after login

GB2312 encoding, in gb2312, one Chinese character occupies 2 bytes

The code is as follows

<?php
function msubstr($str, $start, $len) {   //ȡ
   $tmpstr = "";
   $strlen = $start + $len;
   if(preg_match(&#39;/[/d/s]{2,}/&#39;,$str)){$strlen=$strlen-2;}
   for($i = 0; $i < $strlen; $i++) {
       if(ord(substr($str, $i, 1)) > 0xa0) {
           $tmpstr .= substr($str, $i, 2);
           $i++;
       } else
           $tmpstr .= substr($str, $i, 1);
     }
   return $tmpstr;
 }
  
?>

Copy after login

Compatible The code of the good function

is as follows

function cc_msubstr($str, $start=0, $length, $charset="utf-8", $suffix=true)
{
 if(function_exists("mb_substr"))
  return mb_substr($str, $start, $length, $charset);
 elseif(function_exists(&#39;iconv_substr&#39;)) {
  return iconv_substr($str,$start,$length,$charset);
 }
 $re[&#39;utf-8&#39;]   = "/[/x01-/x7f]|[/xc2-/xdf][/x80-/xbf]|[/xe0-/xef][/x80-/xbf]{2}|[/xf0-/xff]
[/x80-/xbf]{3}/";
 $re[&#39;gb2312&#39;] = "/[/x01-/x7f]|[/xb0-/xf7][/xa0-/xfe]/";
 $re[&#39;gbk&#39;]   = "/[/x01-/x7f]|[/x81-/xfe][/x40-/xfe]/";
 $re[&#39;big5&#39;]   = "/[/x01-/x7f]|[/x81-/xfe]([/x40-/x7e]|/xa1-/xfe])/";
 preg_match_all($re[$charset], $str, $match);
 $slice = join("",array_slice($match[0], $start, $length));
 if($suffix) return $slice."…";
 return $slice;
}

Copy after login