Regular (regular), To use regular expressions, you need to import the re (abbreviation of regular) module in Python. Regular expressions are used to process strings. We know that strings sometimes contain a lot of information that we want to extract. Mastering these methods of processing strings can facilitate many of our operations.
Regular expression (regular), a method of processing strings. http://www.cnblogs.com/alex3714/articles/5169958.html
Regular expression is a commonly used method, because file processing is very common in Python, and the file contains Strings. If you want to process strings, you need to use regular expressions. Therefore, you must master regular expressions. Let’s take a look at the methods included in regular expressions:
(1) match(pattern, string, flags=0)
def match(pattern, string, flags=0):
"""Try to apply the pattern at the start of the string, returning
a match object, or None if no match was found."""
Key points: (1) Start searching from the beginning; (2) Return None if not found.
Let’s take a look at a few examples:
import re String = "abcdef"
m = re.match("abc",string) (1) Match "abc" and see what the returned result is
print(m)
print(m .group())
n = re.match(
"abcf",string)
print(n)
l = re.match("bcd",string) (3) String search in the middle of the list
print(l)
## The running results are as follows:
<_sre.SRE_Match object; span=(0, 3), match='abc'> (1) abc (3)
NoneIt can be seen from the above output result (1) that using match() to match returns a match object object. If you want to convert it into a visible situation, you must use group() to convert (2) As shown here; if the matching regular expression is not in the string, None (3) is returned; match(pattern, string, flag) matches from the beginning of the string, and can only be performed from the beginning of the string Match (4) as shown.
(2)fullmatch(pattern, string, flags=0)
def fullmatch(pattern, string, flags=0):
"""Try to apply the pattern to all of the string, returning
a match object, or None if no match was found."""
return _compile(pattern, flags).fullmatch(string)
Comment from above: Try to apply the pattern to all of the string, returning a match object, or None if no match was found...
(3)search(pattern,string,flags)
def search(pattern , string, flags=0):
"""Scan through string looking for a match to the pattern, returning
a match object, or None if no match was found. ""
# Return_Compile (Pattern, Flags) .search (string)
## Search (Pattern, String, String , flags) annotation is Scan throgh string looking for a match to the pattern, returning a match object, or None if no match was found. Search the regular expression at any position in the string, and return the match object if it is found. If not found, None is returned.
import re String = "ddafsadadfadfafdafdadfasfdafafda"
m = re.search("a",string) (1) Match
from the middle print(m)
print(m.group())
n = re.search(
"N", string) (2) The situation that cannot be matched
# PRINT (n)
This The running results are as follows:
<_sre.SRE_Match object; span=(2, 3), match='a'> (1)
aAs can be seen from the above result (1), search(pattern, string, flag=0) can match from any position in the middle, which expands the scope of use. Unlike match(), it can only match from the beginning. And when a match is found, a match_object object is returned; (2) If you want to display a match_object object, you need to use the group() method; (3) If it cannot be found, it returns None.
(4)sub(pattern,repl,string,count=0,flags=0)
def sub(pattern, repl, string, count=0, flags=0):
"""Return the string obtained by replacing the leftmost
non-overlapping occurrences of the pattern in string by the
replacement repl. in it are processed. If it is
a callable, it's passed the match object and must return
a replacement string to be used."""
##
return _compile(pattern, flags).sub(repl, string, count)# sub(pattern,repl,string,count=0, flags=0) Find and replace, that is, first find whether the pattern is in the string; repl is to find the object matched by the pattern, and replace the characters found by the regular expression with what; count can specify the number of matches and how many matches. The example is as follows:
import re
String = "ddafsadadfadfafdafdadfasfdafafda"
m = re.sub(
" a","A",string)
#Do not specify the number of replacements (1) print(m)
n = re.sub("a",
"A",string,
2) #Specify the number of replacements (2) print(n)
l = re.sub("F",
"B",string)
#Cannot match the situation (3) print(l)
# The running results are as follows:
ddAfsAdAdfAdfAfdAfdAdfAsfdAfAfdA -- (1) ddAfsAdadfadfafdafdadfasfdafafda -- (2)
ddafsadadfadfafdafdadfasfdafafda --(3)
The above code (1) does not specify the number of matches, so the default is to match all; (2) specifies the number of matches number, then only the specified number will be matched; if the regular pattern to be matched at (3) is not in the string, the original string will be returned.
重点:(1)可以指定匹配个数,不指定匹配所有;(2)如果匹配不到会返回原来的字符串;
(5)subn(pattern,repl,string,count=0,flags=0)
def subn(pattern, repl, string, count=0, flags=0):
"""Return a 2-tuple containing (new_string, number).
new_string is the string obtained by replacing the leftmost
non-overlapping occurrences of the pattern in the source
string by the replacement repl. number is the number of
substitutions that were made. repl can be either a string or a
callable; if a string, backslash escapes in it are processed.
If it is a callable, it's passed the match object and must
return a replacement string to be used."""
return _compile(pattern, flags).subn(repl, string, count)
上面注释Return a 2-tuple containing(new_string,number):返回一个元组,用于存放正则匹配之后的新的字符串和匹配的个数(new_string,number)。
import re
string = "ddafsadadfadfafdafdadfasfdafafda"
m = re.subn("a","A",string) #全部替换的情况 (1)
print(m)
n = re.subn("a","A",string,3) #替换部分 (2)
print(n)
l = re.subn("F","A",string) #指定替换的字符串不存在 (3)
print(l)
运行结果如下:
('ddAfsAdAdfAdfAfdAfdAdfAsfdAfAfdA', 11) (1)
('ddAfsAdAdfadfafdafdadfasfdafafda', 3) (2)
('ddafsadadfadfafdafdadfasfdafafda', 0) (3)
As can be seen from the output of the above code, sub() and subn(pattern,repl,string,count=0,flags=0) can be seen that the matching effect of the two is the same, but The returned results are just different. sub() still returns a string, while subn() returns a tuple, which is used to store the new string after the regular expression and the number of replacements.
(6)split(pattern,string,maxsplit=0,flags=0)
def split(pattern, string, maxsplit=0, flags=0):
"""Split the source string by the occurrences of the pattern,
Returning A List Containing The Resulting Substrings. If
Capturing Parentheses are used in Pattern, then the text of all
## This are alsorned as part of the resulting
list. ##
return _compile(pattern, flags).split(string, maxsplit)
## split(pattern, string, maxsplit=0, flags=0) is the splitting of a string. It splits the string according to a certain regular requirement pattern. Returning a list containing the resulting substrings. is to split the string in some way and put the string in a list. The example is as follows:
import re String =
"ddafsadadfadfafdafdadfasfdafafda" m = re.split(
" a",string)
#Split string (1
)print(m)n = re.split(" a",string,
3)
#Specify the number of splits
print(n) l = re.split("F",string )
#The split string does not exist in the list
print(l)
# The running results are as follows:
['dd', 'fs', 'd', 'dfadfafdafdadfasfdafafda'] (3)
It can be seen from (1) that if the beginning or end of the string includes the string to be split, the following element will be a ""; at (2) we can specify the number of times to be split; (3) ) if the string to be split does not exist in the list, put the original string in the list.
(7)findall(pattern,string,flags=)
def findall(pattern, string, flags=0):
"""Return a list of all non-overlapping matches in the string.
If one or more capturing groups are present in the pattern, return
a list of groups; this will be a list of tuples if the pattern
has more than one group.
Empty matches are included in the result."""
return _compile(pattern, flags).findall(string)
## Findall(pattern,string,flags=) returns a list containing all matching elements. Stored in a list. The example is as follows:
import re String =
"dd12a32d46465fad1648fa1564fda127fd11ad30fa02sfd58afafda"
m = re.findall(
" [a-z]",string) #Match letters, match all letters, return a list (1)
print(m)
n = re.findall(
"[0 -9]",string) #Match all numbers and return a list (2)
print(n)
l = re.findall(
"[ABC]", String)
## The running results are as follows: ['d', 'd', 'a', 'd', 'f', 'a' , 'd', 'f', 'a', 'f', 'd', 'a', 'f', 'd', 'a', 'd', 'f', 'a', ' s', 'f', 'd', 'a', 'f', 'a', 'f', 'd', 'a'] (1) ['1', '2', '3', '2', '4', '6', '4', '6', '5', '1', '6', '4', '8', '1', '5 ', '6', '4', '1', '2', '7', '1', '1', '3', '0', '0', '2', '5', '8'] (2) [] (3)
Key points: (1) If no match is found, an empty list is returned; (2) If the number of matches is not specified, only a single match will be made.
(8)finditer(pattern,string,flags=0) def finditer(pattern, string, flags=0): finditer(pattern,string)查找模式,Return an iterator over all non-overlapping matches in the string.For each match,the iterator a match object. 代码如下: import re 运行结果如下: 从上面运行结果可以看出,finditer(pattern,string,flags=0)返回的是一个iterator对象。 (9)compile(pattern,flags=0) def compile(pattern, flags=0): (10)pruge() def purge(): (11)template(pattern,flags=0) def template(pattern, flags=0): 语法: import re The 2nd and 3rd lines above can also be combined into one line to write: ## m The effect is the same, the difference is that The first way is to check the requirements in advance The matching format is compiled (the matching formula is parsed) , so that when matching again, there is no need to compile the matching format. The second abbreviation is that the matching formula must be compiled every time a match is made. Therefore, if you need to match all lines starting with a number from a file with 50,000 lines, it is recommended to compile the regular formula before matching, which will be faster. Matching format: (1) ^ Matches the beginning of the string import re String = None dd
(2)$ Matches the end of the string import re # The running results are as follows: 15111252598 ## (3) Dot (·) matches any character, except newline characters. When the re.DoTALL tag is specified, it can match any character including a newline character ##import re "1511\n1252598" It can be seen from the above code running results that (1) point (·) matches any character; (2) we match any multiple characters, but because the string contains spaces, As a result, only the content before the newline character in the string is matched, and the content after it is not matched. ## (4)[...] For example, [abc] matches "a", "b" or "c"
[object] matches the characters contained in brackets. [A-Za-z0-9] means match A-Z or a-z or 0-9. import re ## The running results are as follows: ['5', '5', 'd', 'd', 'f', 'd', 'f', '5' ] In the above code, we want to match 5, f, d in the string and return a list. ## (5) [^...] [^abc] Matches any character except abc #import re String = "1511 ['1', '1', '1', '\n', '1', '2', 'a', 'a ', '2', '9', '8'] In the above code, we match characters except 5, f, d, [^] matches non-square brackets Characters other than the inner characters. (6)* Matches 0 or more expressions ##import re String = \n125dadfadf2598" It can be seen from the above running results that (*) is an expression that matches 0 or more characters. What we match is 0 or more numbers. It can be seen that if there is no match, the returned Empty, and the last position returned is an empty (""). ## (7)+ Match one or more expressions import re String = ['1511', '125', '2598'] Add (+) matches one or more expressions, and \d+ above matches one or more numeric expressions, at least matching one number. ## (8)? Matches 0 or 1 expressions, non-greedy way string = "1511\n ['1', '5', '1', '1', '', '1', '2', '5', '', '', '', '', '', '', '', '2', '5', '9', '8', ''] The question mark (?) above is to match 0 or 1 expressions, and the above is to match 0 or 1 expressions. If no match is found, empty ("") is returned (9){n} Match n times, define the number of matches for a string (10){n, m} Match n to m expressions ## (11)\w Match alphanumeric characters # \w matches letters and numbers in the string. The code is as follows: import re String = ['1' , '5', '1', '1', '1', '2', '5', 'd', 'a', 'd', 'f', 'a', 'd', ' f', '2', '5', '9', '8'] As can be seen from the above code, \w is used to match alphanumeric characters in the string of. We use regular expressions to match letters and numbers. ## (12) \W \WThe uppercase W is used to match non-letters and numbers, which is exactly the opposite of the lowercase w Examples are as follows: import re "1511\n125dadfadf2598" ['\n'] In the above code, \W is used to match non-letters and numbers, and the result is that newlines are matched. ## (13)\s Matches any whitespace character, equivalent to [\n\t\f] Examples are as follows: import re Run as follows: ['\n' , '\t', '\r', '\x0c'] It can be seen from the above code running results: \s is used to match any empty character, we put Empty characters are matched (14) \S Matches any non-empty characters Examples are as follows: import re String = ['1', '5', '1', '1', '1', '2', '5', 'd', 'a', 'd', 'f', 'a', 'd', 'f', '2', '5', '9', '8'] As can be seen from the above code, \S is used to match any non-empty character. In the result, we matched any non-empty character.
(15)\d Matches any number, equivalent to [0-9] (16) \D Matches any non-number
"""Return an iterator over all non-overlapping matches in the
string. For each match, the iterator returns a match object.
Empty matches are included in the result."""
return _compile(pattern, flags).finditer(string)
string = "dd12a32d46465fad1648fa1564fda127fd11ad30fa02sfd58afafda"
m = re.finditer("[a-z]",string)
print(m)
n = re.finditer("AB",string)
print(n)
"Compile a regular expression pattern, returning a pattern object."
return _compile(pattern, flags)
"Clear the regular expression caches"
_cache.clear()
_cache_repl.clear()
"Compile a template pattern, returning a pattern object"
return _compile(pattern, flags|T)
正则表达式:
string = "dd12a32d46465fad1648fa1564fda127fd11ad30fa02sfd58afafda"
p = re.compile("[a-z]+") #Use first compile(pattern) to compile
m = p.match(string) #Then match
print(m.group())
=
p.match("^[0-9]"
,
'14534Abc'
)<span style="font-family: 宋体"> </span>
"dd12a32d41648f27fd11a0sfdda"
^ Matches the beginning of the string, now we use search() to match that starts with a number m = re.search(
"^[0-9]",string) # Matches a string that starts with a number ( 1)
print(m)
n = re.search(
"^[a-z]+",string) # Matches strings starting with letters Starting from the beginning, if it is matched from the beginning, there is not much difference from search() (2)
print(n.group())
String = "15111252598"
^ Matches the beginning of the string, now we use search()To match
that starts with a number m = re.match("^[0-9]{11}$",string)
print(m. group())
## # Dot (
·) matches all characters except line breaks
m = re. match(".",string) #Dot(·)
matches any character. If the number is not specified, it will match a single one. (1)print (m.group()) n = re.match(".+",string)
#.+ matches multiple any characters, except line breaks (2 )
print(n.group())
# The running results are as follows: 1 1511
String = "1511\n125dadfadf2598"
[]Match Contains the characters in brackets
m = re.findall("[5fd]",string) # Matches 5,f,d
in the string print(m)
\n125dadfadf2598"
Matches the characters contained in brackets Character
m = re.findall("[^5fd]",string)
# Matches characters in the string except 5, f, dprint(m)
# Run as follows:
#* is an expression that matches
0 or more Formula
m = re.findall("\d*",string) #Match 0 or more numbers
print(m)
# The running results are as follows:
"1511\n125dadfadf2598"
#(+)matches 1 or more expressions m = re.findall(
"\d+",string) #matches 1 or more numbers
print(m)
125dadfadf2598"##
#(?
)是match0 or 1 expressions m = re.findall("\d?",string) #Match
0 or 1 expressionsprint(m)
# The running results are as follows:
"1511\n 125dadfadf2598"
#(?)is an expression# that matches 0 or 1 ## m = re.findall("\w",string)
#Match 0 or 1 expressions print(m)
# Run as follows:
#\W
Used to match non-letters and numbers in a string m = re.findall(
"\W",string) #\WUsed to match non-letters and numbers in a string
print(m)
Run as follows:
String = "1511\n125d\t a\rdf\fadf2598"
#\s is used to match any whitespace character in the string , equivalent to [\n\t\r\f]
m = re.findall("\s",string) #\s is used to match strings Any white space character
in print(m)##
"1511\n125d\ta\ rdf\fadf2598"
#\S is used to match any non-empty character m = re.findall(
"\S",string) #\S Used to match any non-empty character
print(m)#
The above is the detailed content of Regular expression (regular). For more information, please follow other related articles on the PHP Chinese website!