determines whether the input content contains illegal characters. Please see the code below
$str = "编程"; // if(!preg_match("/^[\x{4e00}-\x{9fa5}A-Za-z0-9_]+$/u",$str)) //UTF-8汉字字母数字下划线正则表达式 if(!preg_match("/^[\x{4e00}-\x{9fa5}]+$/u",$str)) //UTF-8汉字字母数字下划线正则表达式 { echo "<font color=red>您输入的[".$str."]含有违法字符</font>"; } else { echo "<font color=green>您输入的[".$str."]完全合法,通过!</font>"; }
-----------------------
UTF-8 matches:
In JavaScript, it is very simple to determine whether a string is Chinese.
For example:
Copy code The code is as follows:
var str = "php programming";
if (/^[u4e00-u9fa5] $/.test(str))
{ alert("The string is all in Chinese");
}
else{ alert("This string is not all in Chinese");
}
In PHP, x is used to represent hexadecimal data.
So, transform it into the following code:
Copy code The code is as follows:
$str = "php programming";
if (preg_match("/^[x4e00-x9fa5] $/",$str))
{
print("This string is all in Chinese");
}
else { print("This string is not all Chinese");
}
It seems that no error is reported and the judgment result is correct. However, if $str is replaced with the word "programming", the result still shows "the string is not all in Chinese". It seems that this judgment is not accurate enough.
Important:
Looked up "Proficient in Regular Expressions" and found that for [x4e00-x9fa5], I made an enhanced explanation myself
In PHP's regular expression, [x4e00-x9fa5] is actually the concept of characters and character groups. x{hex} expresses a hexadecimal number. It should be noted that hex can be 1-2 digits or 4 digits. Yes, but if it is 4 digits, curly brackets must be added,
At the same time, if it is a hex larger than x{FF}, it must be used together with the u modifier, otherwise an illegal error will occur
You can only find regular rules for matching full-width characters on the Internet: ^[x80-xff]*^/ , you can match Chinese without adding braces [u4e00-u9fa5], but PHP does not support it. However, since x represents ten Why is hexadecimal data different from the range x4e00-x9fa5 provided in js?
So I changed to the code below and found that it was really accurate:
Copy code The code is as follows:
$str = "php programming";
if (preg_match("/^[x{4e00}-x{9fa5}] $/u",$str))
{
print("This string is all in Chinese");
}
else { print("This string is not all Chinese");
}
I know the final correct expression for using regular expressions to match Chinese characters under UTF-8 encoding in PHP - /^[x{4e00}-x{9fa5}] $/u. I wrote the following test code with reference to the above article (copy Save the following code as a .php file)
<?php $action = trim($_GET['action']); if($action == "sub") { $str = $_POST['dir']; //if(!preg_match("/^[".chr(0xa1)."-".chr(0xff)."A-Za-z0-9_]+$/",$str)) //GB2312汉字字母数字下划线正则表达式 if(!preg_match("/^[\x{4e00}-\x{9fa5}A-Za-z0-9_]+$/u",$str)) //UTF-8汉字字母数字下划线正则表达式 { echo "<font color=red>您输入的[".$str."]含有违法字符</font>"; } else { echo "<font color=green>您输入的[".$str."]完全合法,通过!</font>"; } } ?<form method="POST" action="?action=sub"> 输入字符(数字,字母,汉字,下划线): <input type="text" name="dir" value=""> <input type="submit" value="提交"> </form>
GBK:
Copy code The code is as follows:
preg_match("/^[".chr(0xa1)."-".chr(0xff)."A-Za-z0-9_] $/",$str); //GB2312 Chinese character alphanumeric underline regular expression
The above content is all about how to match Chinese characters with UTF-8 regular expression in PHP. I hope you like it.