Link to this article: http://www.hcoding.com/?p=130
When I first learn regular expressions, I have a question. For example: I need to match the characters between the first pair of "_" in the string "_abc_123_". When I first started learning regular expressions, I would write "/_w*_/", the matching result is "abc_123" instead of "abc"; the master said to add a question mark, "/_w*?_/", then the matching result is "abc".
We know'? ' when used alone means: repeat zero or once, and when '? ' appears after the repeat qualifier, and its function is lazy matching, that is, matching as few characters as possible. Lazy qualifier description:
Yes, "as few repetitions as possible", this is a crude and straightforward explanation of lazy matching.
So how do you understand “as little repetition as possible”? We can explain it from the ignored priority quantifier of regular expressions.
The quantifiers "*?", "+?", "??", "{n,m}?", "{n,}?" are all ignored priority quantifiers. The ignored priority quantifiers are used in ?, It is composed of adding ? after +, *, {}. Ignore priority will first try to ignore when matching. If it fails, it will choose to try after backtracking. For example, if `ab??` matches "abb", it will get "a" instead of "ab". When the engine successfully matches a, because it ignores the priority, the engine first chooses not to match b, and continues to check the expression. If it finds that the expression has ended, the engine will directly report that the match was successful. Specifically, we use the following example to explain step by step the working principle of ignoring priority quantifiers.
Still the above example, use "/_w*?_/" to match the characters between the first pair of "_" in "_abc_123_".
After starting to match the first '_', 'w*?' first decides that it does not need to match any characters because it ignores the priority quantifier. At this time, the expression '/_w*? The second '_' in _/' (the '_' after 'w*?') and the target string '_aThe 'a' in bc_123_' matches, and the match fails. Only then will 'w*?' be used to try the unmatched branch (use w to match a, and the attempt to match a is successful)
Next step, should we try to match or ignore it? Because 'w*?' ignores the priority quantifier and will choose to ignore it, then repeat the previous step. '_' fails to match b, and 'w*?' tries the unmatched branch ab. After repeating the above steps a total of 3 times ( Until the '_' after the expression 'w*?' matches the second '_' of the target string), 'abc' is finally matched.
Process (after starting to match the first '_'):