python中正则表达式*?的一个问题
PHP中文网
PHP中文网 2017-04-17 17:27:36
0
4
748

1.看书上教材说*?是非贪婪模式,那么对于下面这段代码,为啥结果是空呢?

>>> import re
>>> line = 'cats are smart than dogs.'
>>> m=re.match(r'(.*?)',line)
>>> m.group()

结果为:''
为什么结果为空呢?不应该是cats么?难道字符串前默认有个空字符?
新手求明白人给解答下,谢谢了

PHP中文网
PHP中文网

认证0级讲师

reply all(4)
大家讲道理

. = any character
* = appears 0 or more times, equivalent to {0,}
? = Take the one with the fewest previous matches, equivalent to {0,1}. = 任意字符
* = 出现 0 或多次,相当于 {0,}
? = 取前面匹配最少的,相当于 {0,1}

综合起来就是“任意字符出现0次“,所以就是什么都没有。

要匹配到 cats 应该利用 cats 后面的空格,r'(.*?) '

In summary, it means "any character appears 0 times", so there is nothing. 🎜 🎜To match cats, you should use the space after cats, r'(.*?) '. 🎜
小葫芦

My personal understanding is that .*? matches ^, which is the starting position. In regular expressions, the position can also be matched, such as: .*?匹配的是^ 就是开始的那个位置,正则表达式里面,位置也是可以被匹配的,比如:

In [1]: s = 'a'
In [2]: re.sub(r'^','b',s)
Out[2]: 'ba'

这个例子就是替换了^ , 同理 $也一样, 所以你的.*?直接匹配了^.
PS: 在使用正则的时候,尤其是文本内容比较多的时候, 不建议使用.*而是[sS]* 或者[dD]* rrreee

This example replaces ^, and the same goes for $, so your .*? directly matches ^ code>.
PS: When using regular expressions, especially when there is a lot of text content, it is not recommended to use .* but [sS]* or [dD]*
Wait🎜
小葫芦

This is the difference between regular expression greedy matching and non-greedy matching:

  • Greedy mode: When matching is possible, match the longest. The expression does not end with ?. ?结尾。

  • 非贪婪模式:在能匹配的时候,匹配最短的。表达式以?结尾。

比如字符串abcabcabc,当我想要匹配以a开头、以c结尾的字符串时,存在三个匹配:abcabcabcabcabcabc,其中最长的abcabcabc可以用a.*c匹配,而最短的abc可以用a.*?c匹配。

>>> import re
>>> line = "abcabcabc"
>>> m = re.match(r'a.*c', line)
>>> m.group()
'abcabcabc'
>>> m = re.match(r'a.*?c', line)
>>> m.group()
'abc'

为什么结果为空呢?不应该是cats么?

因为.*是贪婪模式,会匹配最长的字符串,其中每个字符都是任意字符(.),即由所有输入字符组成的字符串。而.*?是非贪婪模式,会匹配最短的字符串,其中每个字符都是任意字符(.),即空字符串。

如果想匹配 cats 这个单词,应该使用cats。如果想匹配输入字符串中的第一个单词,应该使用w+S+

>>> m=re.match(r'\w+',line)
>>> m.group()
'cats'
>>> m=re.match(r'\S+',line)
>>> m.group()
'cats'

难道字符串前默认有个空字符?

没有,不过在正则表达式中,可以用^表示字符串的开始位置,用$表示结束位置。注意这两个字符(^$

🎜Non-greedy mode: When matching is possible, match the 🎜shortest🎜. The expression ends with ?. 🎜 🎜For example, the string abcabcabc, when I want to match the string starting with a and ending with c, there are three matches: abc, abcabc and abcabcabc, among which the longest abcabcabc can be used as a.*c > match, and the shortest abc can be matched with a.*?c. 🎜 rrreee
🎜Why is the result empty? Shouldn't it be cats? 🎜
🎜Because .* is a greedy mode and will match the longest string in which each character is any character (.), that is, a character composed of all input characters string. And .*? is a non-greedy mode, which will match the shortest string in which each character is any character (.), that is, an empty string. 🎜 🎜If you want to match the word cats, you should use cats. If you want to match the first word in the input string, you should use w+ or S+. 🎜 rrreee
🎜Is there a null character before the string by default? 🎜
🎜No, but in regular expressions, you can use ^ to represent the starting position of a string, and $ to represent the ending position. Note that these two characters (^ and $) are symbols specified by the matching rules and are used for the rule string you write. It does not mean that the string to be matched contains these characters. Two symbols. 🎜
黄舟

. Boundary matching does not consume the characters in the matching string, and it is a non-greedy mode, so the string cannot be matched

Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template