Perfect judgment function:
function is_utf8($gonten)
{
if (preg_match("/^([".chr(228)."-".chr(233) ."]{1}[".chr(128)."-".chr(191)."]{1}[".chr(128)."-".chr(191)."]{1} ){1}/",$word) == true || preg_match("/([".chr(228)."-".chr(233)."]{1}[".chr(128). "-".chr(191)."]{1}[".chr(128)."-".chr(191)."]{1}){1}$/",$word) == true || preg_match("/([".chr(228)."-".chr(233)."]{1}[".chr(128)."-".chr(191)."]{1 }[".chr(128)."-".chr(191)."]{1}){2,}/",$word) == true)
{
return true;
}
else
{
return false;
}
}
Use the method is_utf8($gonten) to determine whether the string $gonten is utf-8 encoded .
There is such a judgment function circulating on the Internet. In fact, the judgment of this function is incomplete. The function is as follows
function is_utf8($string) {
return preg_match( '%^(?:
[x09x0Ax0Dx20-x7E] # ASCII
| [xC2-xDF][x80-xBF] # non-overlong 2-byte
| xE0[xA0-xBF][x80- xBF] # excluding overlongs
| [xE1-xECxEExEF][x80-xBF]{2} # straight 3-byte
| xED[x80-x9F][x80-xBF] # excluding surrogates
| xF0 [x90-xBF][x80-xBF]{2} # planes 1-3
| [xF1-xF3][x80-xBF]{3} # planes 4-15
| xF4[x80-x8F] [x80-xBF]{2} # plane 16
)*$%xs', $string);
}
The above function is used to judge words such as "food" and "food" It is judged to be UTF-8 encoding, so it is recommended that you use the former.