Have you ever wondered about the key to finding certain text in a document or ensuring that text conforms to a certain format, such as an email address? What is it, and other similar operations?
The key to this type of operation is regular expressions (regex). Let's look at some definitions of regular expressions. In Wikipedia, regular expressions are defined as follows:
Defines the character sequence of the search pattern, which is mainly used for pattern matching or string matching with strings, that is, operations such as "find and replace". The concept emerged in the 1950s, when American mathematician Stephen Kleene formalized the description of regular languages and became commonly used with the Unix text processing utilities ed (editor) and grep (filter).
Another good definition of regular-expressions.info is:
Regular expressions (regex or regexp for short) are special text strings used to describe search patterns. You can think of regular expressions as wildcards on steroids. You may be familiar with wildcard notation, such as *.txt, for finding all text files in your file manager. The regex equivalent is .*\.txt$
I know the concept of regular expressions may still sound a bit vague. So, let’s look at some examples of regular expressions to understand this concept better.
In this section, I will show you some examples of regular expressions to help you further understand this concept.
Suppose you have this regular expression:
/abder/
This just tells us to match only the word abder
.
How about this regular expression?
/a[nr]t/
You can read this regular expression as follows: find a text pattern where the first letter is a
, the last letter is t
, and between these letters is n
or r
. So the matching words are ant
and art
.
Now let me give you a little quiz. How can I write a regular expression that starts with ca
and ends with one or all of the following characters tbr
? Yes, this regular expression can be written as follows:
/ca[tbr]/
If you see a regular expression starting with the circumflex symbol ^
, it means matching a string that starts with the string mentioned after ^
. So if you had the following regular expression, it would match strings starting with This
.
/^This/
Thus, in the following string:
My name is Abder This is Abder This is Tom
Based on the regular expression /^This/
, the following string will be matched:
This is Abder This is Tom
What if we want to match strings that end in with a certain string ? In this example, we use the dollar sign $
. Here is an example:
Abder$
So, in the above string (three lines), this regular expression will be used to match the following pattern:
My name is Abder This is Abder
So, what do you think of this regular expression?
^[A-Z][a-z]
I know it may look complicated at first glance, but let's look at it bit by bit.
We have learned what is the circumflex ^
. This means matching a string that starts with a certain string. [A-Z]
refers to uppercase letters. So if we read this part of the regex: ^[A-Z]
, it tells us to match strings that start with an uppercase letter. The last part [a-z]
means that when a string is found that starts with an uppercase letter, it will be followed by a lowercase letter in the alphabet.
So, which of the following strings will be matched using this regular expression? If you're not sure, you can use Python (as we'll see in the next section) to test your answer.
abder Abder ABDER ABder
Regular expressions are a very broad topic and these examples are just to give you an idea of what they are and why we use them.
RexEgg is a good reference to learn more about regular expressions and see more examples.
Now let’s get to the fun part. We would like to see how to use some of the above regular expressions in Python. The module we will use to handle regular expressions in Python is the re
module.
The first example is about finding the word abder
. In Python we would do this as follows:
import re text = 'My name is Abder' match_pattern = re.match(r'Abder', text) print match_pattern
If you run the above Python script you will get the output: None
!
The script works fine, but the problem is the way the function match()
works. If we return the re
module document, this is what the function match()
does:
如果字符串开头的零个或多个字符与正则表达式模式匹配,则返回相应的匹配对象。如果字符串与模式不匹配,则返回 None;请注意,这与零长度匹配不同。
啊哈,从这里我们可以看出,match()
仅当在字符串的开头找到匹配项时才会返回结果。
我们可以使用函数 search()
,这是基于文档的:
扫描字符串,查找正则表达式模式产生匹配的第一个位置,并返回相应的匹配对象。如果字符串中没有位置与模式匹配,则返回 None;请注意,这与在字符串中的某个点查找零长度匹配不同。
因此,如果我们编写上面的脚本,但使用 search()
而不是 match()
,我们会得到以下输出:
<_sre.SRE_Match 0x101cfc988 处的对象>
即返回了一个匹配对象
。
如果我们想返回结果(字符串匹配),我们使用 group()
函数。如果我们想查看整个比赛,我们使用 group(0)
。因此:
打印 match_pattern.group(0)
将返回输出:Abder
。
如果我们采用上一节中的第二个正则表达式,即 /a[nr]t/
,则可以用 Python 编写如下:
import re text = 'This is a black ant' match_pattern = re.search(r'a[nr]t', text) print match_pattern.group(0)
此脚本的输出是:ant
。
文章越来越长,Python 中的正则表达式主题即使不是一本书,也肯定需要不止一篇文章。
然而,本文旨在让您快速入门并有信心进入 Python 正则表达式的世界。您可以参考 re
文档来了解有关此模块的更多信息以及如何深入了解该主题。
The above is the detailed content of Regular expressions in Python. For more information, please follow other related articles on the PHP Chinese website!