How to match Chinese characters with UTF-8 regular expression, utf-8 regular expression

How to match Chinese characters with UTF-8 regular expression, utf-8 regular expression_PHP tutorial

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Release： 2016-07-13 09:45:07

Original

1637 people have browsed it

How does UTF-8 regular expression match Chinese characters? UTF-8 regular expression

determines whether the input content contains illegal characters. Please see the code below

$str = "编程";
// if(!preg_match("/^[\x{4e00}-\x{9fa5}A-Za-z0-9_]+$/u",$str)) 
//UTF-8汉字字母数字下划线正则表达式
if(!preg_match("/^[\x{4e00}-\x{9fa5}]+$/u",$str)) //UTF-8汉字字母数字下划线正则表达式
 { 
  echo "<font color=red>您输入的[".$str."]含有违法字符</font>"; 
 }
 else 
 {
  echo "<font color=green>您输入的[".$str."]完全合法,通过!</font>"; 

 }

Copy after login

-----------------------

UTF-8 matches:
In JavaScript, it is very simple to determine whether a string is Chinese.

For example:

Copy code The code is as follows:
var str = "php programming";
if (/^[u4e00-u9fa5] $/.test(str))

{ alert("The string is all in Chinese");

}
else{ alert("This string is not all in Chinese");
}

In PHP, x is used to represent hexadecimal data.

So, transform it into the following code:

Copy code The code is as follows:
$str = "php programming";
if (preg_match("/^[x4e00-x9fa5] $/",$str))
{
print("This string is all in Chinese");
}
else { print("This string is not all Chinese");
}

It seems that no error is reported and the judgment result is correct. However, if $str is replaced with the word "programming", the result still shows "the string is not all in Chinese". It seems that this judgment is not accurate enough.
Important:

Looked up "Proficient in Regular Expressions" and found that for [x4e00-x9fa5], I made an enhanced explanation myself
In PHP's regular expression, [x4e00-x9fa5] is actually the concept of characters and character groups. x{hex} expresses a hexadecimal number. It should be noted that hex can be 1-2 digits or 4 digits. Yes, but if it is 4 digits, curly brackets must be added,
At the same time, if it is a hex larger than x{FF}, it must be used together with the u modifier, otherwise an illegal error will occur

You can only find regular rules for matching full-width characters on the Internet: ^[x80-xff]*^/ , you can match Chinese without adding braces [u4e00-u9fa5], but PHP does not support it. However, since x represents ten Why is hexadecimal data different from the range x4e00-x9fa5 provided in js?

So I changed to the code below and found that it was really accurate:

Copy code The code is as follows:
$str = "php programming";
if (preg_match("/^[x{4e00}-x{9fa5}] $/u",$str))
{
print("This string is all in Chinese");
}
else { print("This string is not all Chinese");
}

I know the final correct expression for using regular expressions to match Chinese characters under UTF-8 encoding in PHP - /^[x{4e00}-x{9fa5}] $/u. I wrote the following test code with reference to the above article (copy Save the following code as a .php file)

<&#63;php $action = trim($_GET['action']);

 if($action == "sub") { 

 $str = $_POST['dir'];  

 //if(!preg_match("/^[".chr(0xa1)."-".chr(0xff)."A-Za-z0-9_]+$/",$str)) //GB2312汉字字母数字下划线正则表达式  

 if(!preg_match("/^[\x{4e00}-\x{9fa5}A-Za-z0-9_]+$/u",$str)) 

 //UTF-8汉字字母数字下划线正则表达式 

 {   

echo "<font color=red>您输入的[".$str."]含有违法字符</font>";  

 }  

else  

{  

 echo "<font color=green>您输入的[".$str."]完全合法,通过!</font>";  

 } } 

&#63;<form method="POST" action="&#63;action=sub"> 输入字符(数字,字母,汉字,下划线): 

 <input type="text" name="dir" value=""> 

 <input type="submit" value="提交"> 

</form>

Copy after login

GBK:

Copy code The code is as follows:
preg_match("/^[".chr(0xa1)."-".chr(0xff)."A-Za-z0-9_] $/",$str); //GB2312 Chinese character alphanumeric underline regular expression

The above content is all about how to match Chinese characters with UTF-8 regular expression in PHP. I hope you like it.