. = any character * = appears 0 or more times, equivalent to {0,} ? = Take the one with the fewest previous matches, equivalent to {0,1}. = 任意字符 * = 出现 0 或多次,相当于 {0,} ? = 取前面匹配最少的,相当于 {0,1}
综合起来就是“任意字符出现0次“,所以就是什么都没有。
要匹配到 cats 应该利用 cats 后面的空格,r'(.*?) '
In summary, it means "any character appears 0 times", so there is nothing. 🎜
🎜To match cats, you should use the space after cats, r'(.*?) '. 🎜
My personal understanding is that .*? matches ^, which is the starting position. In regular expressions, the position can also be matched, such as: .*?匹配的是^ 就是开始的那个位置,正则表达式里面,位置也是可以被匹配的,比如:
In [1]: s = 'a'
In [2]: re.sub(r'^','b',s)
Out[2]: 'ba'
This example replaces ^, and the same goes for $, so your .*? directly matches ^ code>. PS: When using regular expressions, especially when there is a lot of text content, it is not recommended to use .* but [sS]* or [dD]*Wait🎜
🎜Non-greedy mode: When matching is possible, match the 🎜shortest🎜. The expression ends with ?. 🎜
🎜For example, the string abcabcabc, when I want to match the string starting with a and ending with c, there are three matches: abc, abcabc and abcabcabc, among which the longest abcabcabc can be used as a.*c > match, and the shortest abc can be matched with a.*?c. 🎜
rrreee
🎜Why is the result empty? Shouldn't it be cats? 🎜
🎜Because .* is a greedy mode and will match the longest string in which each character is any character (.), that is, a character composed of all input characters string. And .*? is a non-greedy mode, which will match the shortest string in which each character is any character (.), that is, an empty string. 🎜
🎜If you want to match the word cats, you should use cats. If you want to match the first word in the input string, you should use w+ or S+. 🎜
rrreee
🎜Is there a null character before the string by default? 🎜
🎜No, but in regular expressions, you can use ^ to represent the starting position of a string, and $ to represent the ending position. Note that these two characters (^ and $) are symbols specified by the matching rules and are used for the rule string you write. It does not mean that the string to be matched contains these characters. Two symbols. 🎜
.
= any character*
= appears 0 or more times, equivalent to{0,}
?
= Take the one with the fewest previous matches, equivalent to{0,1}
.
= 任意字符*
= 出现 0 或多次,相当于{0,}
?
= 取前面匹配最少的,相当于{0,1}
综合起来就是“任意字符出现0次“,所以就是什么都没有。
要匹配到
In summary, it means "any character appears 0 times", so there is nothing. 🎜 🎜To matchcats
应该利用cats
后面的空格,r'(.*?) '
cats
, you should use the space aftercats
,r'(.*?) '
. 🎜My personal understanding is that
.*?
matches^
, which is the starting position. In regular expressions, the position can also be matched, such as:.*?
匹配的是^
就是开始的那个位置,正则表达式里面,位置也是可以被匹配的,比如:这个例子就是替换了
This example replaces^
, 同理$
也一样, 所以你的.*?
直接匹配了^
.PS: 在使用正则的时候,尤其是文本内容比较多的时候, 不建议使用
.*
而是[sS]*
或者[dD]*
rrreee^
, and the same goes for$
, so your.*?
directly matches^ code>.
Wait🎜PS: When using regular expressions, especially when there is a lot of text content, it is not recommended to use
.*
but[sS]*
or [dD]*This is the difference between regular expression greedy matching and non-greedy matching:
Greedy mode: When matching is possible, match the longest. The expression does not end with
?
.?
结尾。非贪婪模式:在能匹配的时候,匹配最短的。表达式以
?
结尾。比如字符串
abcabcabc
,当我想要匹配以a
开头、以c
结尾的字符串时,存在三个匹配:abc
、abcabc
和abcabcabc
,其中最长的abcabcabc
可以用a.*c
匹配,而最短的abc
可以用a.*?c
匹配。因为
.*
是贪婪模式,会匹配最长的字符串,其中每个字符都是任意字符(.
),即由所有输入字符组成的字符串。而.*?
是非贪婪模式,会匹配最短的字符串,其中每个字符都是任意字符(.
),即空字符串。如果想匹配 cats 这个单词,应该使用
cats
。如果想匹配输入字符串中的第一个单词,应该使用w+
或S+
。没有,不过在正则表达式中,可以用
🎜Non-greedy mode: When matching is possible, match the 🎜shortest🎜. The expression ends with^
表示字符串的开始位置,用$
表示结束位置。注意这两个字符(^
和$
?
. 🎜 🎜For example, the stringabcabcabc
, when I want to match the string starting witha
and ending withc
, there are three matches: abc,abcabc
andabcabcabc
, among which the longestabcabcabc
can be used asa.*c
> match, and the shortestabc
can be matched witha.*?c
. 🎜 rrreee 🎜Because.*
is a greedy mode and will match the longest string in which each character is any character (.
), that is, a character composed of all input characters string. And.*?
is a non-greedy mode, which will match the shortest string in which each character is any character (.
), that is, an empty string. 🎜 🎜If you want to match the word cats, you should usecats
. If you want to match the first word in the input string, you should usew+
orS+
. 🎜 rrreee 🎜No, but in regular expressions, you can use^
to represent the starting position of a string, and$
to represent the ending position. Note that these two characters (^
and$
) are symbols specified by the matching rules and are used for the rule string you write. It does not mean that the string to be matched contains these characters. Two symbols. 🎜.
Boundary matching does not consume the characters in the matching string, and it is a non-greedy mode, so the string cannot be matched