Home > Backend Development > PHP Tutorial > How to match Chinese characters with UTF-8 regular expression, utf-8 regular expression_PHP tutorial

How to match Chinese characters with UTF-8 regular expression, utf-8 regular expression_PHP tutorial

WBOY
Release: 2016-07-13 09:45:07
Original
1458 people have browsed it

How does UTF-8 regular expression match Chinese characters? UTF-8 regular expression

determines whether the input content contains illegal characters. Please see the code below

$str = "编程";
// if(!preg_match("/^[\x{4e00}-\x{9fa5}A-Za-z0-9_]+$/u",$str)) 
//UTF-8汉字字母数字下划线正则表达式
if(!preg_match("/^[\x{4e00}-\x{9fa5}]+$/u",$str)) //UTF-8汉字字母数字下划线正则表达式
 { 
  echo "<font color=red>您输入的[".$str."]含有违法字符</font>"; 
 }
 else 
 {
  echo "<font color=green>您输入的[".$str."]完全合法,通过!</font>"; 

 }
Copy after login

-----------------------

UTF-8 matches:
In JavaScript, it is very simple to determine whether a string is Chinese.

For example:

Copy code The code is as follows:
var str = "php programming";
if (/^[u4e00-u9fa5] $/.test(str))

{ alert("The string is all in Chinese");

}
else{ alert("This string is not all in Chinese");
}

In PHP, x is used to represent hexadecimal data.

So, transform it into the following code:

Copy code The code is as follows:
$str = "php programming";
if (preg_match("/^[x4e00-x9fa5] $/",$str))
{
print("This string is all in Chinese");
}
else { print("This string is not all Chinese");
}

It seems that no error is reported and the judgment result is correct. However, if $str is replaced with the word "programming", the result still shows "the string is not all in Chinese". It seems that this judgment is not accurate enough.
Important:

Looked up "Proficient in Regular Expressions" and found that for [x4e00-x9fa5], I made an enhanced explanation myself
In PHP's regular expression, [x4e00-x9fa5] is actually the concept of characters and character groups. x{hex} expresses a hexadecimal number. It should be noted that hex can be 1-2 digits or 4 digits. Yes, but if it is 4 digits, curly brackets must be added,
At the same time, if it is a hex larger than x{FF}, it must be used together with the u modifier, otherwise an illegal error will occur

You can only find regular rules for matching full-width characters on the Internet: ^[x80-xff]*^/ , you can match Chinese without adding braces [u4e00-u9fa5], but PHP does not support it. However, since x represents ten Why is hexadecimal data different from the range x4e00-x9fa5 provided in js?

So I changed to the code below and found that it was really accurate:

Copy code The code is as follows:
$str = "php programming";
if (preg_match("/^[x{4e00}-x{9fa5}] $/u",$str))
{
print("This string is all in Chinese");
}
else { print("This string is not all Chinese");
}

I know the final correct expression for using regular expressions to match Chinese characters under UTF-8 encoding in PHP - /^[x{4e00}-x{9fa5}] $/u. I wrote the following test code with reference to the above article (copy Save the following code as a .php file)

<&#63;php $action = trim($_GET['action']);

 if($action == "sub") { 

 $str = $_POST['dir'];  

 //if(!preg_match("/^[".chr(0xa1)."-".chr(0xff)."A-Za-z0-9_]+$/",$str)) //GB2312汉字字母数字下划线正则表达式  

 if(!preg_match("/^[\x{4e00}-\x{9fa5}A-Za-z0-9_]+$/u",$str)) 

 //UTF-8汉字字母数字下划线正则表达式 

 {   

echo "<font color=red>您输入的[".$str."]含有违法字符</font>";  

 }  

else  

{  

 echo "<font color=green>您输入的[".$str."]完全合法,通过!</font>";  

 } } 

&#63;<form method="POST" action="&#63;action=sub"> 输入字符(数字,字母,汉字,下划线): 

 <input type="text" name="dir" value=""> 

 <input type="submit" value="提交"> 

</form>
Copy after login

GBK:

Copy code The code is as follows:
preg_match("/^[".chr(0xa1)."-".chr(0xff)."A-Za-z0-9_] $/",$str); //GB2312 Chinese character alphanumeric underline regular expression

The above content is all about how to match Chinese characters with UTF-8 regular expression in PHP. I hope you like it.

www.bkjia.comtruehttp: //www.bkjia.com/PHPjc/1042689.htmlTechArticleHow UTF-8 regular expression matches Chinese characters, utf-8 regular expression determines whether the input content contains illegal characters, Please see the following code $str = "Programming";// if(!preg_match("/^[x{4e00}-x{9fa5...
Related labels:
source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template