1. What is a regular expression?
Regular expression (regular expression) describes a string matching pattern, which can be used to: contain Matches a certain
(1) Check whether a string contains a string that matches a certain rule, and the string can be obtained;
(2) Flexibly perform string processing based on matching rules replacement operation.
Regular expressions are actually very simple to learn, and a few more abstract concepts are also easy to understand. The reason why many people feel that regular expressions are complicated is that, on the one hand, most documents do not explain them from the shallower to the deeper, and do not pay attention to the order of concepts, which makes it difficult to understand; on the other hand, various engines The documentation that comes with it usually introduces its unique functions, but these unique functions are not the first thing we need to understand.
Related courses: Boolean education regular expression video tutorial
##2 .How to use regular expressions
2.1 Ordinary characters
Letters, numbers, Chinese characters, underscores, As well as punctuation marks that are not specially defined in the following chapters, they are all ordinary characters. Ordinary characters in an expression, when matching a string, match the same character. Example 1: Expression c, when matching the string abcdef, the matching result is: success; the matched content is: c; the matched position is: starting at 2 and ending at 3. (Note: Whether the subscript starts from 0 or 1 may differ depending on the current programming language). Example 2: Expression bcd, when matching the string abcde, the matching result is: success; the matched content is: bcd; the matched position is: starting at 1 and ending at 4.2.2 Simple escape characters
For some characters that are inconvenient to write, use the method of adding \ in front. In fact, we are all familiar with these characters. There are other punctuation marks that have special uses in later chapters. Add \ in front to represent the symbol itself. For example: ^ and $ have special meanings. If you want to hide the ^ and $ characters in the string, the regular expressions need to be written as \^ and \$. The matching method of these escape characters is similar to that of ordinary characters. Also matches the same character. Example: Expression \$d, when matching the string abc$de, the matching result is: success; the matched content is: $d; the matched position is: starting at 3 and ending at 5.2.3 Expressions that can match 'multiple characters'
Some expression methods in regular expressions can match multiple any one of these characters. For example, the expression \d can match any number. Although it can match any of the characters, it can only be one, not multiple. This is just like when playing poker, the king can replace any card, but the jackpot can replace one card. Example 1: Expression \d\d, when matching abc123, the matching result is: success; the matched content is: 12; the matched position is: Starts at 3 and ends at 5.Example 2: Expression a.\d, when matching aaa100, the matching result is: success; the matched content is: aa1; the matched position is: starting at 1, ended in 4.2.4 Custom expressions that can match 'multiple characters'
Use square brackets [] to include a series of characters that can match them any character. Use [^] to include a series of characters, and it can match any character except the characters among them. In the same way, although any one of them can be matched, it can only be one, not multiple. Example 1: When the expression [bcd][bcd] matches abc123, the matching result is: success; the matched content is: bc; the matched position is : Starts at 1 and ends at 3. Example 2: When the expression [^abc] matches abc123, the matching result is: success; the matched content is: 1; the matched position is: starting at 3 and ending at 4.2.5 Special symbols that modify the number of matches
The expressions mentioned in the previous chapter, whether they are expressions that can only match one type of character or expressions that can match multiple characters, can only be matched once. If you use an expression plus a special symbol that modifies the number of matches, you can match repeatedly without writing the expression again.
The usage method is: put the "number of times modification" after the modified expression. For example: [bcd][bcd] can be written as [bcd]{2}.
Example 1: When the expression \d+/.?\d* matches it costs $12.5 , the matching result is: success; the matched content is: 12.5 ; The matched positions are: starting at 10 and ending at 14.
Example 2: When the expression go{2, 8}gle matches Ads by goooooogle, the matching result is: success; the matched content is: goooooogle; the matched position is: starting at 7, Ended at 17.
2.6 Some other symbols representing abstract meanings
Some symbols represent abstract special meanings in expressions:
Further text explanation is still relatively abstract, so examples are given to help everyone understand.
Example 1: When the expression ^aaa matches xxx aaa xxx, the matching result is: failure. Because ^ is required to match the beginning of the string, ^aaa can only match when aaa is at the beginning of the string, such as: aaa xxx xxx.
Example 2: When the expression aaa$ matches xxx aaa xxx, the matching result is: failure. Because $ is required to match the end of the string, aaa$ can only match when aaa is at the end of the string, such as: xxx xxx aaa.
Example 3: Expression .\b. When matching @@@abc, the matching result is: success; the matched content is: @a; the matched position is: starting at 2 and ending at 4.
Further explanation: \b is similar to ^ and $. It does not match any character itself, but it requires it to be on both sides of the position in the matching result. One side is the \w range and the other side is the non-\w range. .
Example 4: When the expression \bend\b matches weekend, endfor, end, the matching result is: success; the matched content is: end; the matched position is: starting at 15 and ending at 18.
Some symbols can affect the relationship between subexpressions within an expression:
Example 5: The expression Tom|Jack matches the string I' m Tom,he is Jack, the matching result is: success; the matched content is: Tom; the matched position is: starting at 4 and ending at 7. When matching the next one, the matching result is: success; the matched The content is: Jack; the matched position is: starting at 15 and ending at 19.
Example 6: When the expression (go\s*)+ matches Let's go go go!, the matching result is: success; the matched content is: go go go; the matched position is: start On 6, ended on 14.
Example 7: When the expression ¥(\d+\.?\d) matches $10.9,¥20.5, the matching result is: success; the matched content is: ¥20.5; the matched position is : Starts at 6 and ends at 10. The content matched by obtaining the bracket range alone is: 20.5.
3. Some advanced usage of regular expressions
3.1 Greedy and non-greedy in the number of matches
Greedy mode:
When using modified matching times When using special symbols, there are several representation methods that can enable the same expression to match different times at the same time, such as: "{m, n}", "{m,}", ?, *, +, the specific number of matches depends on Depends on the matching string. This kind of repeated matching expression an indefinite number of times always matches as many times as possible during the matching process. For example, for the text dxxxdxxxd, the example is as follows:
It can be seen that when matching, \w+ always matches as many characters as possible that meet its rules. Although in the second example, it does not match the last d, it is also to make the entire expression match successfully. In the same way, expressions with * and "{m, n}" are matched as much as possible, and expressions with ? are also "matched" as much as possible, depending on whether they can match or not. This matching principle is called greedy mode.
Non-greedy mode:
Add the ? sign after the special symbol that modifies the number of matches, so that expressions with an indefinite number of matches can be matched as little as possible, and expressions that can be matched or not matched can be "unmatched" as much as possible. This matching principle is called non-greedy mode, also called reluctant mode. If there are fewer matches, the entire regular expression will fail to match. Similar to the greedy mode, the non-greedy mode will minimally match more to make the entire regular expression match successfully. For example, for the text "dxxxdxxxd":
##For more situations, examples are as follows: Example 1: Expressionaa
bb
aa
bb
aa
bb
3.2 Backreference\1,\2...
When the expression is matched, the expression engine will include parentheses () The string matched by the expression is recorded. When obtaining the matching result, the string matched by the expression contained in parentheses can be fired separately. This has been demonstrated many times in the previous examples. In practical applications, when a certain boundary is used to search and the content to be obtained does not include the boundary, parentheses must be used to specify the desired range. For example, the previous##3.3 Preliminary. Search, no match; reverse pre-search, no matchIn the previous chapter, I talked about several special symbols that represent abstract meanings: ^, $, \b. One thing they have in common is that they do not match any characters themselves, but only add a condition to the "two ends of the string" or the "gap between characters". After understanding this concept, this section will continue to introduce another one. A more flexible method that adds conditions to "both ends" or "gaps"
Forward pre-search: (?=xxxxx), (?!xxxxx) Format: (?=xxxxx), in the matched string, the "gap" or "both ends" it is located in. The additional condition is: the right side of the gap must be able to match the expression of "xxxxx" . Because it is only used as an additional condition on this gap, it does not affect the subsequent expressions to actually match the characters after this gap. This is similar to \b , which does not match any characters by itself. \b just takes the characters before and after the gap and makes a judgment. It will not affect the subsequent expressions to actually match. Example 1: When the expression Windows(?=NT|XP) matches Windows 98, Windows NT, and Windows 2000, it will only match Windows in Windows NT, and other Windows words will not be matched. Example 2: The expression (\w)((?=\1\1\1)(\1))+ will match the first 4 of 6 f when matching the string aaa ffffff 9999999999 , can match 9 9 and the first 7. This expression can be interpreted as: if letters and numbers are repeated more than 4 times, the part before the last 2 digits will be matched. Of course, this expression does not need to be written like this, but it is only used for demonstration purposes. Format: (?!xxxxx) , located on the right side of the gap, must not match the xxxxx part of the expression. Example 3: When the expression ((?!\bstop\b).)+ matches fdjka ljfdl stop fjdsla fdj, it will match from the beginning to the position before stop. If there is no stop in the string, then Matches the entire string. Example 4: When the expression do(?!\w) matches the string done, do, dog, it can only match do. In this example, using (?!\w) after do has the same effect as using \b. Reverse pre-search: (? The concepts of these two formats are similar to forward pre-search , the condition required for reverse pre-search is: the "left side" of the gap. The two formats respectively require that it must be able to match and must not be able to match the specified expression, instead of judging the right side. The same as "forward pre-search" in that they are an addition to the gap and do not match any characters themselves.
4. Other general rules 4.1 Rule 1 In expressions, you can use \xXX and \uXXXX to represent a character (X represents a hexadecimal number) 4.2 Rule 2 While the expressions \s, \d, \w, \b represent special meanings, the corresponding Capital letters indicate the opposite meaning 4.3 Rule 3 has special meaning in expressions, Summary of characters that need to add \ to match the character itself 4.4 Rule 4 Brackets () If you want the matching results not to be recorded for later use, you can use the (?:xxxxx) format. Example 1: When the expression (?:(\w)\1)+ matches "a bbccdd efg", the result is "bbccdd". Matches within the bracket (?:) range are not logged, so (\w) is quoted using \1. 4.5 Rule 5 Introduction to commonly used expression attribute settings: Ignorecase, Singleline, Multiline, Global Related articles: How to use regular expressions to match parentheses in PHP Summary on the use of common functions in PHP regular expressions Simple code example of php regular expression matching Chinese characters