Quantifier
Come on, let’s continue to deepen our understanding of regular expressions. In this part, we will understand quantifiers and why we use quantifiers. Just think about it. If you want to match dozens or hundreds of characters, do you have to write them one by one? So quantifiers appeared.
The lexicon of quantifiers is: {min,max}. min and max are both nonnegative integers. If the comma is present and max is omitted, then max has no limit. If both comma and max are omitted, repeat min times. For example, \b[1-9][0-9]{3}\b matches numbers between 1000 and 9999 (“\b” represents word boundaries), while \b[1-9][0 -9]{2,4}\b, matches a number between 100 ~ 99999.
Let’s look at an example below to match 4 to 7 letters in the string in English
import re a = 'java*&39android##@@python' # 数量词 findall = re.findall('[a-z]{4,7}', a) print(findall)
Output result:
['java', 'android', 'python']
Note, here are Greedy and non-greedy. So let's take a look at the related concepts first:
Greedy mode: Its characteristic is to read the entire string at once. If it does not match, spit out the rightmost character and then match until a matching character is found. The length of the string or character string is 0. Its purpose is to read as many characters as possible, so it returns immediately when the first match is read.
Lazy mode: Its characteristic is to start from the left side of the string and try to match without reading the characters in the string. If it fails, read one more character and match again. This cycle continues until a match is found. will return the matched string, and then match again until the end of the string.
The above example is greedy. What if you want to use non-greedy, that is, lazy mode?
If you want to use non-greedy, add a ?, the above example is modified as follows:
import re a = 'java*&39android##@@python' # 贪婪与非贪婪 re_findall = re.findall('[a-z]{4,7}?', a) print(re_findall)
The output results are as follows:
['java', 'andr', 'pyth']
As can be seen from the output results, android Only print except andr, Python only prints except pyth, because the lazy mode is used here.
Of course, there are some special characters that can also express quantities, such as:
?:告诉引擎匹配前导字符 0 次或 1 次 +:告诉引擎匹配前导字符 1 次或多次 *:告诉引擎匹配前导字符 0 次或多次
To summarize the knowledge points in this part, it is the following table:
Greedy | Lazy | Description |
? | ? ? | Zero or one occurrence, equivalent to {0,1} |
##? | appears one or more times, equivalent to {1,} | |
*? | Zero or more occurrences, equivalent to {0,} | ##{n} |
Exactly n occurrences | {n,m} | |
At least n branches and more than m occurrences | {n,} | |
At least n occurrences |
##