This article brings you relevant knowledge about Python. It mainly introduces how python regular expressions implement overlapping matching. It has a good reference value. Let’s take a look at it together. I hope it will help Everyone is helpful.
[Related recommendations: Python3 video tutorial]
import regex string = '100101010001' str_re = '101' print(regex.findall(str_re, string, overlapped=True))
Ordinary re library matching can only match one '101'.
Regular expression can be understood as an expression for filtering data, which is a limited number of atoms and metacharacters.
Atoms: Basic unit, each expression has at least one atom
Ordinary characters make up atoms | |
---|---|
Non-printing characters make up atoms (Characters that are not printed on the output station) | \n :Line break \t:Tab backspace character |
Common characters form atoms | \w: Match any letter, Numbers, underscores \W: Opposite of \w \d: Matches any decimal number \D: Opposite of \d \s : Matches any whitespace characters, such as spaces, newlines, indents \S: Opposite of \s |
Atomic table composed of atoms | A group of atoms forms a table, which is declared by [] The priorities of the atoms in the table are equal, but the contents only appear in order If the atom table starts with ^, it means inversion |
#普通字符组成原子 pat1 = "abcd" #非打印字符组成原子 pat2 = "\n" #通用字符做原子 pat3 = "\w" #原子表组成原子 pat4 = "py[abc]" #可以匹配pya,pyb,pyc,但匹配pyab等原子表重复出现的情况失败 #原子表开头带 ^ 表示取反 pat5 = "py[^abc]" #第三个位置匹配除了a,b,c外的任意一个字符
Metacharacters: Characters with special meaning in regular expressions
. | Matches any character, except newline characters |
---|---|
^ | Matches the beginning of the string |
$ | Matches the end position of the string. When there are multiple sets of matching matches, the last set of matches in the string is returned. |
* | Match 0, 1, n times the previous atom [Greedy mode: match as many as possible] |
? | Match 0, 1 time the previous atom [Lazy mode: exact match] |
Match 1, n times the previous atom | |
{ j } | The preceding atom appears j times |
{ j , } | The preceding atom appears at least j times |
{ j , k } | The preceding atom appears at least j times and at most k times |
i | j | matches i or j, if i and j appear at the same time, matching the i |
( ) | group, restricting the combination of this group of data to be as described in (), and only returning the description in brackets The content |
Pattern modifier
is the parameter at the flag position in the function, which can be changed without changing the regular expression. Its meaning is to adjust the matching results.
re.I | Ignore case when matching |
re.M | Multi-line matching |
re.L | Localized identification matching |
re.U | according to Unicon character matching, affecting \w \W |
matches including newline characters |
Regular matching
Commonly used functions for regular matching: (call the regular expression module re)
: Scan the string str and return the position of pat (the first successful match). Flag is used to control the matching method of regular expressions
import re str = 'python' pat = 'pytho[a-n]' print(re.search(pat, str))
: Scan the starting position of the string str and return the position of pat (the first successful match). Flag is used to control the matching method of the regular expression [if it starts, it will not If it matches, it ends and returns none】
import re str_1 = 'hello world' str_2 = 'world hello' pat = 'world' print(re.match(pat, str_1)) print(re.match(pat, str_2))
: Compile the regular expression pat and return the regular expression object
: Match all, use a list to return all matched substrings in string [not just the first time], pos and endpos can be specified in Starting position in string
: Global matching function, matches all substrings in str that match pat, loads a list and returns the result
import re str = "hello world hello world hello world" pat = "hello" print(re.complie(pat).findall(str)) print(re.complie(pat).findall(str, 5, 15))
: Replace matching items in the string [clean data], You can use count to specify the maximum number of replacements
import re str = "400-823-823" pat = "-" #短横改空格,最大替换次数2 str_new = re.sub(pat, " ", str, count=2)
Python3 video tutorial]
The above is the detailed content of How to implement overlapping matching with python regular expressions. For more information, please follow other related articles on the PHP Chinese website!