PHP—PCRE regular expression escape sequence (backslash)

伊谢尔伦
Release: 2016-11-21 17:26:38
Original
1355 people have browsed it

Backslash has many uses. First, if it is followed by a non-alphanumeric character, it cancels the special meaning represented by that character. This use of the backslash as an escape character is available both inside and outside character classes.

For example, if you want to match a "*" character, you need to write "*" in the pattern. This applies when a character would have a special meaning without escaping. However, for non-alphanumeric characters, it is always safe to add a backslash in front of it to declare that it represents itself when it is required for original text matching. If you want to match a backslash, use "\" in the pattern.

Note:

Backslash has special meaning in single-quoted strings and double-quoted strings, so to match a backslash, the pattern must be written as "\\". Translation annotation: "/\/", first of all, it is used as a string, and the backslash will be escaped, then the escaped result is //, this is the pattern obtained by the regular expression engine, and the regular expression engine also Considered an escape tag, it will escape the delimiter / and get an error, so 4 backslashes are needed to match one backslash.

If a pattern is compiled with the PCRE_EXTENDED option, whitespace characters in the pattern (except in character classes) and all characters from the unescaped # to the end of the line will be ignored. To use whitespace characters or # in this case, they need to be escaped.

The second use of the backslash provides a means of controlling the visible encoding of non-printing characters. Except for the binary 0 that terminates a pattern, there is no strict restriction on the occurrence of non-printing characters (themselves), but when a pattern is prepared using a text editor, using the following escape sequence is better than using the binary Characters will be easier.

a

Ring character (hex 07)

cx

"control-x", x is any character

e

escape (hex 1B)

f

replace Page (Hex 0C)

n

Line break (Hex 0A)

p{xx}

A character that conforms to the xx attribute

P{xx}

A character that does not conform to the xx attribute

r

Carriage return (hex 0D)

t

Horizontal tab (hex 09)

xhh

hh hexadecimal encoded character

ddd

ddd octal The exact effect of encoded characters, or backreferences

cx is as follows: If x is a lowercase letter, it is converted to uppercase. Next, invert the 6th bit of the character (hexadecimal 40, the first bit from the right is bit 0). For example, cz becomes 1A in hexadecimal, c{ becomes 3B in hexadecimal, and c; becomes 7B in hexadecimal.

After "x", read two hexadecimal numbers (letters can be uppercase or lowercase). In UTF-8 mode, "x{...}" is allowed, and the content inside the curly braces is hexadecimal significant digits. It interprets the given hexadecimal number as a UTF-8 character code. The original hexadecimal escape sequence, xhh, matches a double-byte UTF-8 character if its value is greater than 127

in"

Any non-horizontal whitespace character (since PHP 5.2.4)

s

Any non-whitespace character

S

Any non-whitespace character

v

Any vertical whitespace character (since PHP 5.2.4)

V

Any non-vertical whitespace character (since PHP 5.2.4)

w

Any word character

W

Any non-word character

Each pair of escape sequences above represents two disjoint ones in the complete character set Partially, any character will definitely match one of them and will never match the other.

Word characters refer to any letters, numbers, and underscores. That is to say, any character that can form a perl word. The definition of letters and numbers is controlled through the PCRE character table, which can be changed to match by specifying locale settings. For example, in the France (fr) locale, some character codes above 128 are used for accented letters, which can be matched by w .

These character class sequences can appear inside or outside character classes. They match one character at a time within the character type they represent. If the current match point is at the end of the target string, all characters in them will fail to match because there are no characters left to match.

The fourth use of backslash is some simple assertions. An assertion specifies a condition that must be matched at a specific position; they do not consume any characters from the target string. Next we'll discuss more complex assertions using subgroups. Backslash assertions include:

b

word boundary

B

non-word boundary

A

start position of the target (independent of multiline mode)

Z

end position or end of the target Newline character (independent of multiline mode)

z

End position of target (independent of multiline mode)

G

Position of first match in target

These assertions cannot appear in character classes (but note, "b" has different meanings in the character class. It represents the backspace character)

A word boundary represents that in the target string, the current character and the previous character are different and match w or W (a matches w, one matches W), or the current character matches w as the beginning or end character of a string.

A, Z, z assertions are different from traditional ^ and $ (see below) in that they always match the beginning and end of the target string and are not restricted by pattern modifiers. They are not affected by the PCRE_MULTILINE, PCRE_DOLLAR_ENDONLY options. The difference between Z and z is that when the end character of the string is a newline character, Z will match it as the end of the string, while z only matches the end of the string.

G Assert that in a preg_match() call with the $offset parameter specified, it will only succeed if the current matching position is at the matching start point. When the value of $offset is not 0, it is different from A. Translation Note: Another difference from A is that when using preg_match_all(), each time G is matched, it only asserts whether it is the starting position of the matching result, while A asserts whether the starting position of the matching result is at the beginning of the target string.

Since PHP 4.3.3, Q and E can be used to ignore regular expression metacharacters in patterns. For example: w+Q.$.E$ will match one or more word characters, followed by a period, a $, a period, and finally anchor to the end of the string.

Since PHP 5.2.4. K can be used to reset matching. For example, footKbar matches "footbar". But the matching result is "bar". However, the use of K will not interfere with the content within the subgroup. For example, if (foot)Kbar matches "footbar", the result in the first subgroup will still be "foo". Translator's Note: The effect of K placed in the subgroup and outside the subgroup is the same.


Related labels:
php
source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template
About us Disclaimer Sitemap
php.cn:Public welfare online PHP training,Help PHP learners grow quickly!