Coding table
Double-byte character encoding range
1. gbk (gb2312/gb18030)
x00-xff gbk double byte encoding range
x20-x7f ascii
xa1-xff Chinese gb2312
x80-xff Chinese gbk
2. utf-8 (unicode)
u4e00-u9fa5 (Chinese)
x3130-x318f (Korean
xac00-xd7a3 (Korean)
u0800-u4e00 (Japanese)
$str = "China";
echo $str;
echo "
";//if (preg_match("/^[".chr(0xa1)."-".chr(0xff)."]+$/", $str)) { //Can only be used in the case of gb2312
if (preg_match("/^[x7f-xff]+$/", $str)) { //Compatible with gb2312, utf-8
echo "Enter correctly";
} else {
echo "Wrong input";
}
?>There is actually a lot of knowledge involved in judging Chinese. The underlying internal encoding involves various differences in utf-8, gbk, and gb13800. I once studied the problem of how to distinguish what character a character is. There are too many related details.