In "Learning PHP & MYSQL - Character Encoding (Part 1)", the conversion relationship between Unicode and UTF-8 is introduced, and a UTF-8 encoding rule is summarized. Based on this encoding rule, a UTF-8 encoding parsing program is written. , the following is the implementation of PHP:
/*
Program function, $str is a UTF-8 encoded string mixed with Chinese and English.
This string is correctly decoded and displayed according to UTF-8 encoding rules.
*/
$str = 'Today is very happy, so we decided to go to KFC to eat Coke chicken wings!!!';
/*
$str is to be intercepted The string
$len is the number of characters intercepted
*/
function utf8sub($str,$len) {
if($len return '';
}
$offset = 0; // Offset when intercepting high-order bytes
$chars = 0; // Number of characters intercepted
$res = ''; // Store the intercepted result string
while($chars // Take the first byte of the string first
// Convert it to decimal
// Then convert to binary
$high = ord(substr($str,$offset,1));
// echo '$high='. $high .'
' ;
if($high == null ){ // If the high bit is null, it proves that it has been fetched to the end, break directly
break;
}
if(($high> >2) === 0x3F){ // Shift the high bit to the right by 2 bits and compare it with binary 111111. If they are the same, take 6 bytes
// Intercept 2 bytes
$count = 6;
}else if(($high>>3) === 0x1F){ // Shift the high bit to the right by 2 bits and compare it with binary 11111. If they are the same, take 5 bytes
// Intercept 3 bytes$count = 5;
}else if(($high>>4) === 0xF){ // Shift the high bit to the right by 2 bits and compare it with binary 1111. If it is the same, take 4 bytes
//Intercept 4 bytes
$count = 4;
}else if(($high>>5) === 0x7){ // Shift the high bit right by 2 bits, and binary 111 comparison, if they are the same, take 3 bytes
// Intercept 5 bytes
$count = 3;
}else if(($high>>6) === 0x3) { // Shift the high bit to the right by 2 bits, compare it with binary 11, if it is the same, take 2 bytes
// Intercept 6 bytes
$count = 2;
}else if(($high >>7) === 0x0){ // Shift the high bit to the right by 2 bits, compare it with binary 0, if it is the same, take 1 byte
$count = 1;
}
// echo ' $count='.$count.'
';
$res .= substr($str,$offset,$count); // Take out a character and concatenate it with $res string
$chars = 1; // The number of characters intercepted is 1
$offset = $count; // The intercepted high offset is moved backward by $count bytes
}
return $res;
}
echo utf8sub($str,100);