This article brings you a knowledge summary (super detailed) about regular expressions in PHP interviews. It has certain reference value. Friends in need can refer to it. I hope it will be helpful to you.
Related recommendations: "2019 PHP Interview Questions Summary (Collection)"
1. Introduction
1. What is a regular expression
A regular expression (Regular Expression) is a formula that uses a certain pattern to match a type of string.
Regular expressions use a single string to describe and match a series of strings that match a certain syntax rule.
Regular expressions are cumbersome, but they are powerful. After learning, applying them will not only improve your efficiency, but also give you an absolute sense of accomplishment. As long as you read this tutorial carefully and make certain references when applying it, mastering regular expressions is not a problem.
Many programming languages support string operations using regular expressions.
2. The role of regular expressions
Split, search, match, and replace strings
3. Regular expressions in PHP Formula
There are two sets of regular expression function libraries in PHP. The functions of the two are similar, but the execution efficiency is slightly different:
One set is composed of PCRE (Perl Compatible Regular Expression ) provided by the library. Functions named with the prefix "preg_";
A set provided by POSIX (Portable Operating System Interface of Unix) extensions. Use functions named with the prefix "ereg_";
PCRE comes from the Perl language, and Perl is one of the most powerful languages for string operations. The initial version of PHP was a product developed by Perl.
PCRE syntax supports more features and is more powerful than POSIX syntax. Therefore, this article mainly introduces the regular expressions of PCRE syntax
4. The composition of regular expressions
In PHP, a regular expression Divided into three parts: delimiters, expressions and pattern modifiers.
Delimiter
The delimiter can use any ascii character except letters, numbers, backslash (\) and whitespace characters.
The most commonly used delimiters are forward slash (/), hash symbol (#) and negation symbol (~).
Expression
consists of some special characters and non-special strings. It is the main part that determines the matching rules of regular expressions.
is used to turn on and off certain functions/modes.
2. Delimiter
1. Selection of delimiter
When using the PCRE function, the regular expression Must be enclosed by delimiters. The
delimiter can use any ASCII character except letters, numbers, backslashes (\
), and whitespace characters.
The most commonly used delimiters are forward slash (/
), hash symbol (#) and negation symbol (
~
).
/foo bar/ (合法) #^[^0-9]$# (合法) +php+ (合法) %[a-zA-Z0-9_-]% (合法)
#[a-zA-Z0-9_-]/ (非法,两边的分隔符不同) a[a-zA-Z0-9_-]a (非法,分隔符不能是字母) \[a-zA-Z0-9_-]\ (非法,分隔符不能是反斜线(`\`))
In addition to the delimiters mentioned above, you can also use bracket-style delimiters. The left bracket and the right bracket serve as the start and end delimiters respectively.
{this is a pattern}
If the delimiter is used in a regular expression, it must use a backslash (\
) Escape.
If delimiters often appear within regular expressions, it is best to use other delimiters to improve readability.
/http:\/\// #http://#
When you need to put a string into a regular expression, you can use the preg_quote() function to escape it. Its second parameter (optional) can be used to specify the delimiter that needs to be escaped.
//在这个例子中,preg_quote($word) 用于保持星号和正斜杠(/)原文涵义,使其不使用正则表达式中的特殊语义。 $textBody = "This book is */very/* difficult to find."; $word = "*/very/*"; $reg = "/" . preg_quote($word, '/') . "/"; echo $reg; // 输出 '/\*\/very\/\*/' echo preg_replace ($reg, "<i>" . $word . "</i>", $textBody); // 输出 'This book is <i>*/very/*</i> difficult to find.'
You can add pattern modifiers after the end delimiter to affect the matching effect.
The following example is a case-insensitive match
#[a-z]#i
3. Metacharacters
1. Escape character
Character | Description |
---|---|
Change the next character Marked by a special character, a literal character, or a backreference. | For example, 'n' matches the character "n". 'n' matches a newline character. The sequence '\' matches "" and "(" matches "(". |
Description | ||
---|---|---|
Matches the beginning of the input string (or at In multi-line mode, it is the beginning of the line) |
||
matches the end of the input string (or in multi-line mode, it is the end of the line ) |
||
Matches a word boundary, that is, the position between a word and a space |
||
Non-word boundary matching |
Character | Description |
---|---|
* |
Matches the preceding subexpression zero or more times. For example, zo can match "z" and "zoo". Equivalent to {0,}. |
|
Matches the preceding subexpression one or more times. For example, 'zo ' can match "zo" and "zoo", but not "z". Equivalent to {1,}. |
? |
When this character is used as a quantifier, it means matching the previous subexpression zero or one time. For example, "do(es)?" can match "do" or "does" . ? Equivalent to {0,1}. |
{n} |
n is a non-negative integer. Match a certain number of n times. For example, 'o{2}' cannot match the 'o' in "Bob", but it can match the two o's in "food". |
{n,} |
n is a non-negative integer. Match at least n times. For example, 'o{2,}' cannot match 'o' in "Bob", but it can match all o's in "foooood". 'o{1,}' is equivalent to 'o '. 'o{0,}' is equivalent to 'o*'. |
{n,m} |
m and n are both non-negative integers, where n <= m. Match at least n times and at most m times. For example, "o{1,3}" will match the first three o's in "fooooood". 'o{0,1}' is equivalent to 'o?'. Please note that there cannot be a space between the comma and the two numbers. |
Character | Description | |
---|---|---|
\d | Matches a numeric character. Equivalent to [0-9] . | |
\D | Matches a non-numeric character. Equivalent to [^0-9] . | |
\w | Matches letters, numbers, and underscores. Equivalent to [A-Za-z0-9_] . | |
\W | Matches non-letters, numbers, and underscores. Equivalent to [^A-Za-z0-9_] . | |
\s | Matches any whitespace characters, including spaces, tabs, and form feeds etc. Equivalent to [ \f\n\r\t\v] . | |
\S | Matches any non-whitespace character. Equivalent to [^ \f\n\r\t\v] . | |
. | Matches any single character except newlines (n, r). To match any character including 'n', use a regular expression like "(. | n)". |
Character | Description |
---|---|
\n | Matches a newline character. Equivalent to x0a and cJ. |
\r | Matches a carriage return character. Equivalent to x0d and cM. |
\t | Matches a tab character. Equivalent to x09 and cI. |
Characters | Description |
---|---|
| | vertical bar characters| can match multiple selections. For example, 'z|food' can match "z" or "food". '(z|f|g)ood' matches "zood", "food" or "good". |
Character | Description |
---|---|
[x|y] | Matches x or y. For example, 'z|food' can match "z" or "food". '(z|f)ood' matches "zood" or "food". |
[xyz] | character set. Matches any one of the characters contained. For example, [abc] would match 'a' in "plain". |
[^xyz] | Negative value character set. Matches any character not included. For example, [^abc] can match 'p', 'l', 'i', 'n' in "plain". |
[a-z] | Character range. Matches any character within the specified range. For example, [a-z] matches any lowercase alphabetic character in the range 'a' to 'z'. |
[^a-z] | Negative character range. Matches any character not within the specified range. For example, [^a-z] matches any character that is not in the range 'a' to 'z'. |
Character | Description |
---|---|
? | Matches the pattern when the character immediately follows any other limiter (*, , ?, {n}, {n,}, {n,m}) Be non-greedy. Non-greedy mode matches as little of the searched string as possible, while the default greedy mode matches as much of the searched string as possible. For example, for the string "oooo", 'o ?' will match a single "o", while 'o ' will match all 'o's. |
( )
GroupCharacter | Description |
---|---|
(pattern) | Match pattern and get this match. To match parentheses characters, use \( or \) . |
(?:pattern) | Matches pattern but does not obtain the matching result, which means that this is a non-acquisition match and is not stored. for later use. This is useful when using the "or" character (|) to combine parts of a regular expression. For example, 'industr(?:y|ies) is a simpler expression than 'industry|industries'. |
(?=pattern) | Look ahead positive assert at the beginning of any string matching pattern Match the search string. This is a non-fetch match, that is, the match does not need to be fetched for later use. For example, "Windows(?=95|98|NT|2000)" can match "Windows" in "Windows2000", but cannot match "Windows" in "Windows3.1". Prefetching does not consume characters, that is, after a match occurs, the search for the next match begins immediately after the last match, rather than starting after the character containing the prefetch. |
(?!pattern) | Positive negative assert (negative assert), matches at the beginning of any string that does not match pattern Find string. This is a non-fetch match, that is, the match does not need to be fetched for later use. For example, "Windows(?!95|98|NT|2000)" can match "Windows" in "Windows3.1", but cannot match "Windows" in "Windows2000". Prefetching does not consume characters, that is, after a match occurs, the search for the next match begins immediately after the last match, rather than starting after the character containing the prefetch. |
##(?<=pattern)
| Reverse (look behind) positive pre-check is similar to forward positive pre-check, except In the opposite direction. For example, "(?<=95|98|NT|2000)Windows" can match "Windows" in "2000Windows", but cannot match "Windows" in "3.1Windows". |
The above is the detailed content of Knowledge summary of regular expressions in PHP interviews (super detailed). For more information, please follow other related articles on the PHP Chinese website!