Regular expression, also known as regular expression, (English: Regular Expression, often abbreviated as regex, regexp or RE in code), it is a concept in computer science. Regular expressions are often used to retrieve and replace text that matches a certain pattern. Many programming languages support string manipulation using regular expressions. For example, Perl has a powerful regular expression engine built into it. The concept of regular expressions was originally popularized by tool software in Unix. A regular expression is a logical formula that operates on strings (including ordinary characters (for example, letters between a to z) and special characters (called "metacharacters")), which uses some predefined specific characters. , and the combination of these specific characters form a "rule string". This "rule string" is used to express a filtering logic for strings. A regular expression is a text pattern that describes one or more strings to match when searching for text.
After talking a lot of nonsense, you may still be confused. Let’s explain it through examples. We can use regular expression testing tools, or python All is OK. First, we enter a piece of text.
hello, my name is Tina, my phone number is 123456 and my web is http://tina.com.
[a-zA-z]+://[^\s]*
We can get the web link, That is, the url in the text. Isn’t it amazing?
This is because it has its own matching rules, some of which are as follows.
Pattern | Description |
. | Any character |
* | 0 or more expressions |
One or more expressions |
You can check more matching rules by yourself.
?,*, ,\d,\w are all equivalent characters
? are equivalent to matching Length {0,1}
* is equivalent to matching length {0,}
is equivalent to matching length {1,}
\d Equivalent to [0-9]
\D Equivalent to [^0-9]
\w Equivalent to [A-Za-z_0-9]
\W Equivalent to [^A-Za-z_0-9]
res = re.match('hello\s(\d+)sword')
res = re.match('hello.*(\d+)sword')
import re useData = str(input('请输入字符串数据:')) ''' 匹配字符串中的数字,+是匹配前面的子表达式一次或多次 ''' digital = re.findall('\d+',useData) print(digital)
The "." character matches any single character. The "\" character is an escape character. "[…]" is the character set. "(.*?)" is the most commonly used character in python crawlers. It is called a greedy algorithm and can match any character.Let’s look at a sample code below.
import re a=‘xxixxjshdxxlovexxsfhxxpythonxx' data=re.findall(‘xx(.*?)xx') print(data)
Run resultsSpecial characters So-called special characters , which are characters with special meanings, such as those in runoo*b. Simply put, they represent the meaning of any string. If you want to find the * symbol in a string, you need to escape the *, that is, add a \ before it, and runo*ob matches the string runo\*ob. Many metacharacters require special treatment when trying to match them. To match these special characters, you must first "escape" the characters, that is, precede them with the backslash character \. The following table lists the special characters in regular expressions:[‘I’,‘love’,‘python’]
Description | |
Matches the end of the input string. If the RegExp object's Multiline property is set, $ also matches ‘\n’ or ‘\r’. To match the $ character itself, use $. | |
Marks the start and end of a subexpression. Subexpressions can be obtained for later use. To match these characters, use ( and ). | |
Matches the preceding subexpression zero or more times. To match the * character, use *. | |
Matches the preceding subexpression one or more times. To match characters, use . | |
Matches any single character except the newline character \n. To match . , use . . |