Solutions to several problems with PHP regular expression matching Chinese-PHP Tutorial-php.cn

Solutions to several problems with PHP regular expression matching Chinese

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Release： 2016-07-25 08:59:13

Original

2090 people have browsed it

$str = 'People's Republic of China 123456789abcdefg';
echo preg_match("/^[u4e00-u9fa5_a-zA-Z0-9]{3,15}$",$strName);
?>

Copy the code

Run the above code, you will be prompted: Warning: preg_match(): Compilation failed: PCRE does not support L, l, N, P, p, U, u, or X at offset 3 in F:wwwrootphptest.php on line 2

The reason is: the following Perl escape sequences are not supported in PHP regular expressions: L, l, N, P, p, U, u, or X

In UTF-8 mode, "x{...}" is allowed, and the content in the curly brackets is a string representing a hexadecimal number.

The original hexadecimal escape sequence xhh matches a double-byte UTF-8 character if its value is greater than 127. Solution:

preg_match("/^[x80-xff_a-zA-Z0-9]{3,15}$",$strName);
preg_match('/[x{2460}-x{2468}] /u', $str);

Copy code

match internal code Chinese characters Test it as he provided:

$str = "php programming";
if (preg_match("/^[x{2460}-x{2468}]+$/u",$str)) {
print("The string is all Chinese");
} else {
print("The string is not all Chinese");
}
?>

Copy the code

This operation is still correct Determining whether it is Chinese is abnormal. However, since the hexadecimal data represented by x, why is it different from the range x4e00-x9fa5 provided in js? So the code was modified as follows:

$str = "php programming";
if (preg_match("/^[x4e00-x9fa5]+$/u",$str)) {
print("this string All are Chinese");
} else {
print("This string is not all Chinese");
}
?>

Copy code

warning is generated again: Warning: preg_match() [function.preg-match]: Compilation failed: invalid UTF-8 string at offset 6 in test.php on line 3 Then I modified it and wrapped "4e00" and "9fa5" with "{" and "}" respectively. I ran it again and found that it was accurate this time:

$str = "php programming";
if (preg_match("/^[x{4e00}-x{9fa5}]+$/u",$str)) {
print("This string is all Chinese");
} else {
print("This string is not all Chinese");
}
?>

Copy code

I know utf in php The final correct expression for matching Chinese characters using regular expressions under -8 encoding: /^[x{4e00}-x{9fa5}]+$/u，

The final version of the implementation code:

//if (preg_match("/^[".chr(0xa1)."-".chr(0xff)."]+$/", $str)) { / /Can only be used in the case of GB2312
if (preg_match("/^[x7f-xff]+$/", $str)) { //Compatible with gb2312, utf-8
echo "Correct input";
} else {
echo "Wrong input";
}
?>

Copy code

Example 2,

$action = trim($_GET['action']);
if($action == "sub")
{
$str = $_POST['dir'];
//if(!preg_match("/^[".chr(0xa1)."-".chr(0xff)."A-Za-z0-9_]+$/",$str)) //GB2312 Chinese characters Alphanumeric underline regular expression
if(!preg_match("/^[x{4e00}-x{9fa5}A-Za-z0-9_]+$/u",$str)) //UTF-8 Chinese characters Number underline regular expression
{
echo "The [".$str."] you entered contains illegal characters";
}
else
{
echo "The [".$str."] you entered is completely legal and passed!";
}
}
?>
Input characters (numbers, letters, Chinese characters, underscores):

Copy the code

Attached, the double-byte character encoding range in PHP

1. GBK (GB2312/GB18030)

x00-xff GBK double-byte encoding range x20-x7f ASCII xa1-xff Chinese gb2312 x80-xff Chinese gbk

2. UTF-8 (Unicode)

u4e00-u9fa5 (Chinese) x3130-x318F (Korean xAC00-xD7A3 (Korean) u0800-u4e00 (Japanese)

Let’s introduce these, I hope it will help everyone understand the method of regular matching Chinese in PHP. Programmer's Home, I wish you all the best in your studies and progress.