Friends in need of PHP correctly parsing UTF-8 strings can refer to it.
The code is as follows
代码如下 |
复制代码 |
$str = '今天非常Happy,所有决定去KFC吃可乐鸡翅!!!';
/*
$str 是待截取的字符串
$len 是截取的字符数
*/
function utf8sub($str,$len) {
if($len <= 0){
return '';
}
$offset = 0; // 截取高位字节时的偏移量
$chars = 0; // 截取到的字符数
$res = ''; // 存放截取的结果字符串
while($chars < $len){
// 先取字符串的第一个字节
// 将它转为十进制
// 再转为二进制
$high = ord(substr($str,$offset,1));
// echo '$high='. $high .' ';
if($high == null ){ // 如果取出高位为null,证明已经取到末尾,直接break
break;
}
if(($high>>2) === 0x3F){ // 将高位右移2位,和二进制111111比较,相同则取6个字节
// 截取2个字节
$count = 6;
}else if(($high>>3) === 0x1F){ // 将高位右移2位,和二进制11111比较,相同则取5个字节
// 截取3个字节
$count = 5;
}else if(($high>>4) === 0xF){ // 将高位右移2位,和二进制1111比较,相同则取4个字节
// 截取4个字节
$count = 4;
}else if(($high>>5) === 0x7){ // 将高位右移2位,和二进制111比较,相同则取3个字节
// 截取5个字节
$count = 3;
}else if(($high>>6) === 0x3){ // 将高位右移2位,和二进制11比较,相同则取2个字节
// 截取6个字节
$count = 2;
}else if(($high>>7) === 0x0){ // 将高位右移2位,和二进制0比较,相同则取1个字节
$count = 1;
}
// echo '$count='.$count.' ';
$res .= substr($str,$offset,$count); // 取出一个字符与$res字符串连接
$chars += 1; // 截取到的字符数+1
$offset += $count; // 截取高位偏移量向后移$count字节
}
return $res;
}
echo utf8sub($str,100);
|
|
Copy code |
|
$str = 'Today is very happy, so we decided to go to KFC to eat Coke Chicken Wings!!!';
/*
$str is the string to be intercepted
$len is the number of characters intercepted
*/
function utf8sub($str,$len) {
if($len <= 0){
return '';
}
$offset = 0; // Offset when intercepting high-order bytes
$chars = 0; // Number of characters intercepted
$res = ''; // Store the intercepted result string
while($chars < $len){
//Get the first byte of the string first
//Convert it to decimal
//Convert to binary
$high = ord(substr($str,$offset,1));
// echo '$high='. $high .'
';
if($high == null ){ // If the high bit is null, it proves that it has been fetched to the end, break directly
break;
}
if(($high>>2) === 0x3F){ // Shift the high bit to the right by 2 bits and compare it with binary 111111. If it is the same, take 6 bytes
//Intercept 2 bytes
$count = 6;
}else if(($high>>3) === 0x1F){ // Shift the high bit to the right by 2 bits and compare it with binary 11111. If it is the same, take 5 bytes
// Intercept 3 bytes
$count = 5;
}else if(($high>>4) === 0xF){ // Shift the high bit to the right by 2 bits and compare it with binary 1111. If it is the same, take 4 bytes
//Intercept 4 bytes
$count = 4;
}else if(($high>>5) === 0x7){ // Shift the high bit to the right by 2 bits and compare it with binary 111. If it is the same, take 3 bytes
//Intercept 5 bytes
$count = 3;
}else if(($high>>6) === 0x3){ // Shift the high bit to the right by 2 bits, compare it with binary 11, if it is the same, take 2 bytes
//Intercept 6 bytes
$count = 2;
}else if(($high>>7) === 0x0){ // Shift the high bit to the right by 2 bits, compare it with binary 0, if it is the same, take 1 byte
$count = 1;
}
// echo '$count='.$count.'
';
$res .= substr($str,$offset,$count); // Take out a character and connect it to the $res string
$chars += 1; // Number of intercepted characters +1
$offset += $count; // Intercept the high offset and move it backward by $count bytes
}
return $res;
}
echo utf8sub($str,100);
http://www.bkjia.com/PHPjc/632169.htmltruehttp: //www.bkjia.com/PHPjc/632169.htmlTechArticleFriends in need of PHP correctly parsing UTF-8 strings can refer to it. The code is as follows Copy the code $str = 'Today is very happy, so we decided to go to KFC to eat Coke chicken wings!!!'; /* $str is the word to be intercepted...