.net c# Regular expression balanced group/recursive matching
Balanced group/recursive matching
The balanced group syntax introduced here is supported by the .Net Framework; other languages/libraries may not support this function or support this function But a different syntax is required.
Sometimes we need to match a nestable hierarchical structure like (100 * (50 + 15)). In this case, simply using (.+) will only match the leftmost left bracket and the rightmost right bracket. The content between brackets (here we are discussing greedy mode, lazy mode also has the following problems). If the number of occurrences of the left bracket and the right bracket in the original string is not equal, such as (5 / (3 + 2))), then the number of the two in our matching result will not be equal. Is there any way to match the longest, matching content between brackets in such a string?
In order to avoid ( and ( completely confusing your brain, let’s use angle brackets instead of round brackets. Now our question becomes how to change xx
The following syntax construct needs to be used here:
(?'group') Name the captured content as group and push it onto the stack ( Stack)
(?'-group') Pops the captured content named group last pushed onto the stack from the stack. If the stack is originally empty, the matching of this group fails
(?(group)yes|no) If the stack If there is a captured content named group, continue to match the yes part of the expression, otherwise continue to match the no part
(?!) Zero-width negative lookahead assertion, because there is no suffix expression, trying to match always fails
If you If you are not a programmer (or you call yourself a programmer but don’t know what a stack is), you can understand the above three syntaxes like this: the first is to write a "group" on the blackboard, and the second is to write a "group" on the blackboard. Erase a "group". The third step is to see if there is still "group" written on the blackboard. If there is, continue to match the yes part. Otherwise, match the no part.
What we need to do is every time we encounter the left bracket, Just push an "Open", and every time it encounters a right bracket, pop one out. At the end, check whether the stack is empty - if it is not empty, it proves that there are more left brackets than right brackets, and the match should fail. . The regular expression engine will backtrack (discard some of the first or last characters) and try to match the entire expression.
< ;]* #The outermost left bracket is not the contents of the parentheses
(
(
(
(
(
(? 'Open' & lt;) #碰
[^& lt; & gt; & gt; ]* #Match the content that is not a parenthesis after the left bracket
)+
#match The content that is not the bracket after the right bracket is
)+
)*
(?(Open)(?!)) #Before encountering the outermost right bracket, determine whether there is any "Open" on the blackboard that has not been erased; If there are, the most common application of the matching failure
& gt; #外 如果 如果 Copy code
The most common application is to match HTML. The following example can match the nested & lt; div & gt; label: