Question
You are trying to use a regular expression to match a large block of text, and you need Match across multiple lines.
Solution
This problem typically occurs when you use dot (.) to match any character, but forget that dot (.) cannot match newlines. conforming facts. For example, suppose you want to try to match the C-delimited comment:
>>> comment = re.compile(r'/\*(.*?)\*/')<br/>>>> text1 = '/* this is a comment */'<br/>>>> text2 = '''/* this is a<br/>... multiline comment */<br/>... '''<br/>>>><br/>>>> comment.findall(text1)<br/>[' this is a comment ']<br/>>>> comment.findall(text2)<br/>[]<br/>>>><br/>
To fix this problem, you can modify the pattern string to add support for newlines. For example:
>>> comment = re.compile(r'/\*((?:.|\n)*?)\*/')<br/>>>> comment.findall(text2)<br/>[' this is a\n multiline comment ']<br/>>>><br/>
In this pattern, (?:.|\n) specifies a non-capturing group (that is, it defines a group that is only used for matching and cannot be captured or numbered individually. ).
Discussion
re.compile()
The function accepts a flag parameter called re.DOTALL
, which is very useful here . It allows . in regular expressions to match any character including newlines. For example:
>>> comment = re.compile(r'/\*(.*?)\*/', re.DOTALL)<br/>>>> comment.findall(text2)<br/>[' this is a\n multiline comment ']<br/>
For simple cases using re.DOTALL
tag parameters work well, but if the pattern is very complex or multiple patterns are combined to construct a string token (Detailed description in Section 2.18). At this time, some problems may occur when using this mark parameter. If you have a choice, it's better to define your own regular expression pattern so that it works well without the need for additional marker parameters.
Recommended tutorial: "Python Tutorial"
The above is the detailed content of Understanding Python multi-line matching patterns. For more information, please follow other related articles on the PHP Chinese website!