The Chinese string of UTF-8 is three bytes
//Encoding UTF-8
echo strlen('test text a test text');
echo '-';
echo mb_strlen('test text a test text','utf-8');
?>
Output: 25-9
GB2312 The Chinese string is two bytes
//encoding GB2312
echo strlen('test text a test text');
echo '-';
echo mb_strlen('test text a test text','Gb2312');
?>
Output: 17-9
In the Mysql database (versions after 5.1), if the field If the type is varchar(10), 10 characters (not bytes) can be inserted;
So when judging the length of the string, it needs to be distinguished according to the document encoding.
represents a simple UTF-8 string interception (interception based on the number of characters)
/*
* UTF-8 string interception
* $str The string to be intercepted
* $start interception starting position
* $length interception length
*/
function cutStr($str,$start,$length) {
$restr = '';
$j = 0;
$end = $length + $start - 1;
$plen = strlen($str);
for($i=0 ;$i<$plen;$i++) {
$restr .= ord($str[$i])>127 ? $str[$i].$str[++$i].$str[ ++$i] : $str[$i];
$j++;
if ($j < $start){$restr = '';}
if ($j >= $ end){break;}
}
$restr .='';
return $restr;
}
$str = 'China News Service, September 24, 2 The third financial summit of leaders of the Group of Ten (G20) will be held today in Pittsburgh, USA. ';
echo $str;
echo '
';
echo utf8_substr($str,0,25);
echo '
';
?> ;