Understanding Python multi-line matching patterns

Guanhui
Release: 2020-07-24 17:22:33
forward
2999 people have browsed it

Understanding Python multi-line matching patterns

Question

You are trying to use a regular expression to match a large block of text, and you need Match across multiple lines.

Solution

This problem typically occurs when you use dot (.) to match any character, but forget that dot (.) cannot match newlines. conforming facts. For example, suppose you want to try to match the C-delimited comment:

>>> comment = re.compile(r&#39;/\*(.*?)\*/&#39;)<br/>>>> text1 = &#39;/* this is a comment */&#39;<br/>>>> text2 = &#39;&#39;&#39;/* this is a<br/>... multiline comment */<br/>... &#39;&#39;&#39;<br/>>>><br/>>>> comment.findall(text1)<br/>[&#39; this is a comment &#39;]<br/>>>> comment.findall(text2)<br/>[]<br/>>>><br/>
Copy after login

To fix this problem, you can modify the pattern string to add support for newlines. For example:

>>> comment = re.compile(r&#39;/\*((?:.|\n)*?)\*/&#39;)<br/>>>> comment.findall(text2)<br/>[&#39; this is a\n multiline comment &#39;]<br/>>>><br/>
Copy after login

In this pattern, (?:.|\n) specifies a non-capturing group (that is, it defines a group that is only used for matching and cannot be captured or numbered individually. ).

Discussion

re.compile() The function accepts a flag parameter called re.DOTALL , which is very useful here . It allows . in regular expressions to match any character including newlines. For example:

>>> comment = re.compile(r&#39;/\*(.*?)\*/&#39;, re.DOTALL)<br/>>>> comment.findall(text2)<br/>[&#39; this is a\n multiline comment &#39;]<br/>
Copy after login

For simple cases using re.DOTALL tag parameters work well, but if the pattern is very complex or multiple patterns are combined to construct a string token (Detailed description in Section 2.18). At this time, some problems may occur when using this mark parameter. If you have a choice, it's better to define your own regular expression pattern so that it works well without the need for additional marker parameters.

Recommended tutorial: "Python Tutorial"

The above is the detailed content of Understanding Python multi-line matching patterns. For more information, please follow other related articles on the PHP Chinese website!

Related labels:
source:jb51.net
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template