Regular expressions in Python

WBOY
Release: 2023-08-27 10:05:21
Original
1222 people have browsed it

Regular expressions in Python

Have you ever wondered about the key to finding certain text in a document or ensuring that text conforms to a certain format, such as an email address? What is it, and other similar operations?

The key to this type of operation is regular expressions (regex). Let's look at some definitions of regular expressions. In Wikipedia, regular expressions are defined as follows:

Defines the character sequence of the search pattern, which is mainly used for pattern matching or string matching with strings, that is, operations such as "find and replace". The concept emerged in the 1950s, when American mathematician Stephen Kleene formalized the description of regular languages ​​and became commonly used with the Unix text processing utilities ed (editor) and grep (filter).

Another good definition of regular-expressions.info is:

Regular expressions (regex or regexp for short) are special text strings used to describe search patterns. You can think of regular expressions as wildcards on steroids. You may be familiar with wildcard notation, such as *.txt, for finding all text files in your file manager. The regex equivalent is .*\.txt$

I know the concept of regular expressions may still sound a bit vague. So, let’s look at some examples of regular expressions to understand this concept better.

Regular Expression Example

In this section, I will show you some examples of regular expressions to help you further understand this concept.

Suppose you have this regular expression:

/abder/
Copy after login

This just tells us to match only the word abder.

How about this regular expression?

/a[nr]t/
Copy after login

You can read this regular expression as follows: find a text pattern where the first letter is a, the last letter is t, and between these letters is n or r. So the matching words are ant and art.

Now let me give you a little quiz. How can I write a regular expression that starts with ca and ends with one or all of the following characters tbr? Yes, this regular expression can be written as follows:

/ca[tbr]/
Copy after login

If you see a regular expression starting with the circumflex symbol ^, it means matching a string that starts with the string mentioned after ^. So if you had the following regular expression, it would match strings starting with This.

/^This/
Copy after login

Thus, in the following string:

My name is Abder
This is Abder
This is Tom
Copy after login

Based on the regular expression /^This/, the following string will be matched:

This is Abder
This is Tom
Copy after login

What if we want to match strings that end in with a certain string ? In this example, we use the dollar sign $. Here is an example:

Abder$
Copy after login

So, in the above string (three lines), this regular expression will be used to match the following pattern:

My name is Abder
This is Abder
Copy after login

So, what do you think of this regular expression?

^[A-Z][a-z]
Copy after login

I know it may look complicated at first glance, but let's look at it bit by bit.

We have learned what is the circumflex ^. This means matching a string that starts with a certain string. [A-Z] refers to uppercase letters. So if we read this part of the regex: ^[A-Z], it tells us to match strings that start with an uppercase letter. The last part [a-z] means that when a string is found that starts with an uppercase letter, it will be followed by a lowercase letter in the alphabet.

So, which of the following strings will be matched using this regular expression? If you're not sure, you can use Python (as we'll see in the next section) to test your answer.

abder
Abder
ABDER
ABder
Copy after login

Regular expressions are a very broad topic and these examples are just to give you an idea of ​​what they are and why we use them.

RexEgg is a good reference to learn more about regular expressions and see more examples.

Regular Expressions in Python

Now let’s get to the fun part. We would like to see how to use some of the above regular expressions in Python. The module we will use to handle regular expressions in Python is the re module.

The first example is about finding the word abder. In Python we would do this as follows:

import re
text = 'My name is Abder'
match_pattern = re.match(r'Abder', text)
print match_pattern
Copy after login

If you run the above Python script you will get the output: None!

The script works fine, but the problem is the way the function match() works. If we return the re module document, this is what the function match() does:

如果字符串开头的零个或多个字符与正则表达式模式匹配,则返回相应的匹配对象。如果字符串与模式不匹配,则返回 None;请注意,这与零长度匹配不同。

啊哈,从这里我们可以看出,match() 仅当在字符串的开头找到匹配项时才会返回结果。

我们可以使用函数 search(),这是基于文档的:

扫描字符串,查找正则表达式模式产生匹配的第一个位置,并返回相应的匹配对象。如果字符串中没有位置与模式匹配,则返回 None;请注意,这与在字符串中的某个点查找零长度匹配不同。

因此,如果我们编写上面的脚本,但使用 search() 而不是 match(),我们会得到以下输出:

<_sre.SRE_Match 0x101cfc988 处的对象>

即返回了一个匹配对象

如果我们想返回结果(字符串匹配),我们使用 group() 函数。如果我们想查看整个比赛,我们使用 group(0)。因此:

打印 match_pattern.group(0)

将返回输出:Abder

如果我们采用上一节中的第二个正则表达式,即 /a[nr]t/,则可以用 Python 编写如下:

import re
text = 'This is a black ant'
match_pattern = re.search(r'a[nr]t', text)
print match_pattern.group(0)
Copy after login

此脚本的输出是:ant

结论

文章越来越长,Python 中的正则表达式主题即使不是一本书,也肯定需要不止一篇文章。

然而,本文旨在让您快速入门并有信心进入 Python 正则表达式的世界。您可以参考 re 文档来了解有关此模块的更多信息以及如何深入了解该主题。

The above is the detailed content of Regular expressions in Python. For more information, please follow other related articles on the PHP Chinese website!

Related labels:
source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template