Knowledge summary and sharing of regular expressions in Python-Python Tutorial-php.cn

Home

Backend Development

Python Tutorial

Knowledge summary and sharing of regular expressions in Python

黄舟

Sep 23, 2017 am 11:34 AM

python Summary expression

This article introduces the basic knowledge of Python regular expressions. The content of this article does not include how to write efficient regular expressions and how to optimize regular expressions. Please check other tutorials for these topics.

1. Regular expression syntax

1.1 Characters and character classes
1 Special characters: \.^$?+*{}[]()|
If you want to use literal values for the above special characters, you must use \ to escape
2 Character classes
1. One or more characters contained in [] are called character classes, and character classes are used in matching If no quantifier is specified, only one of them will be matched.
　　 2. A range can be specified within the character class, for example [a-zA-Z0-9] represents any character from a to z, A to Z, and 0 to 9
　　 3. The left square bracket is followed by A ^ means negating a character class. For example, [^0-9] means that it can match any non-digit character.
4. Within the character class, except for \, other special characters no longer have special meanings and all represent literal values. ^ placed in the first position represents negation, placed in other positions represents ^ itself, - placed in the middle represents a range, and placed as the first character in a character class represents - itself.

5. Shorthand can be used inside the character class, such as \d \s \w
3 Shorthand
Can match any character except newline, if there is re.DOTALL flag , then matches any character including newline
　　 \d matches a Unicode digit, if re.ASCII is included, matches 0-9
　　 \D matches Unicode non-digit
　　 \s matches Unicode blank, if accompanied by re. .ASCII, then match a
in \t\n\r\f\v 　　　\S matches Unicode non-blank
　　　\w matches Unicode word character, if it contains re.ascii, then matches [a-zA -Z0-9_] One of
　　　 \W Matches Unicode non-monad character

　1.2 Quantifier
　　 1. ? Matches the previous character 0 or 1 times
　　 2. * Matches the preceding character Character 0 or more times
3. + matches the previous character 1 or more times
4. {m} matches the previous expression m times
5. {m,} matches the previous expression at least m times
6. {,n} matches the previous regular expression at most n times
7. {m,n} matches the previous regular expression at least m times and at most n times
Notes:
The above quantifiers are all greedy modes and will match as many matches as possible. If you want to change to non-greedy mode, follow the quantifier with a ? to achieve

1.3 Grouping and capturing
1. The role of () ：
1. Capture the contents of the regular expression in () for further processing. You can turn off the capture function of this bracket by following ?: after the left bracket
2. Extract part of the regular expression Grouping, so as to use quantifiers or |
2 Reflection refers to the content captured in the previous ():
1. Backreference
by group number Each parentheses that does not use ?: will be assigned a group, Starting from 1 and increasing from left to right, you can use \i to reference the content captured by the expression in the previous ()
2. Back-reference the content captured in the previous parentheses through the group name
You can use the left bracket to Followed by ?P, put the group name in angle brackets to create an alias for a group, and then use (?P=name) to reference the previously captured content. Such as (? P\w+)\s+(?P=word) to match repeated words.
3 Notes:
Backreferences cannot be used in character class [].

1.4 Assertions and Markers
Assertions will not match any text, but only impose certain constraints on the text where the assertion is located
1 Commonly used assertions:
1. \b matches the boundary of a word and is placed in the character class [] means backspace
　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　 Responddie in on the in on non-word boundaries, affected by ASCII tags
　　　 3. \A can match at the beginning
　　　 4. ^ can match at the beginning, if there is the MULTILINE flag , then match
after each newline character 5. \Z matches
at the end 6. $ matches at the end, if there is the MULTILINE flag, match
before each newline character 7. ( ?=e) Positive look-ahead
　　 8. (?!e) Negative look-ahead
　　 9. (?<=e) Positive look-back
　　 10. (? 　　2 Explanation of look-ahead lookback
　　 Look-ahead: exp1(?=exp2) The content after exp1 must match exp2
　　 Negative look-ahead: exp1(?!exp2) The content after exp1 cannot match exp2
　　　Look-back: (?< =exp2)exp1 The content before exp1 must match exp2
. Negative lookahead: (?. For example: we want to find hello, but hello must be followed by world, regular expression. The expression can be written like this: "(hello)\s+(?=world)", which is used to match "hello wangxing" and "hello world". It can only match the latter's hello

　1.5 Conditional matching
　　(?(id)yes_exp|no_exp): If the subexpression corresponding to the id matches the content, then it will match yes_exp, otherwise it will match no_exp

　1.6 Flags of regular expressions
　　1. Regular expression There are two ways to use the flag
1. By passing in the flag parameter to the compile method, multiple flags can be separated by |, such as re.compile(r"#[\da-f]{6}\b" , re.IGNORECASE|re.MULTILINE)
2. Add a flag to the regular expression by adding (? flag) in front of the regular expression, such as (?ms)#[\da-z]{6}\ b
　　2. Commonly used flags
　　re.A or re.ASCII, so that \b \B \s \S \w \W \d \D assumes that the string is ASCII
　　re .I or re.IGNORECASE makes the regular expression ignore case
　　re.M or re.MULTILINE multi-line matching, so that each ^ is matched after each carriage return, and each $ is matched before each carriage return
re.S or re.DOTALL enables . to match any character, including carriage return
re. [ ], since the default whitespace is no longer interpreted. Such as:
　　　　 re.compile(r"""
　　　　　　　　　[^>]*? #Not an attribute of src
　　　　src= #src attribute The beginning of
?P=quote) #Right bracket
""",re.VERBOSE|re.IGNORECASE)

2. Python regular expression module

2.1 Regular expressions have four main functions for processing strings

1. Match to see whether a string conforms to the grammar of the regular expression, usually returning true or false
2. Obtain the regular expression Formula to extract text that meets the requirements in the string

3. Replace the text that matches the regular expression in the search string and replace it with the corresponding string

4. Split the string using regular expressions

# 2.2 Two ways to use regular expressions in the re module in Python

1. Use the re.compile(r, f) method to generate a regular expression object, and then call The corresponding method of the regular expression object. The advantage of this approach is that it can be used multiple times after generating the regular expression object.
2. There is a corresponding module method for each object method of the regular expression object in the re module. The difference is that the first parameter passed in is a regular expression string. This method is suitable for regular expressions that are used only once. 2.3 Common methods of regular expression objects

1. rx.findall(s,start, end):
Returns a list. If there is no grouping in the regular expression, the list contains all matching content.
If there is no grouping in the regular expression, If there is grouping, each element in the list is a tuple. The tuple contains the content matched in the subgroup, but the content matched by the entire regular expression is not returned.
　　2. rx.finditer(s, start, end):
Return an iterable object
Iterate over the iterable object and return a matching object each time. You can call the group() method of the matching object to view the content matched by the specified group. 0 represents the entire regular expression. The content matched by the formula
3. rx.search(s, start, end):
Returns a matching object. If there is no match, it returns None
The search method only matches once and stops. It will not Continue to match
4. rx.match(s, start, end):
If the regular expression matches at the beginning of the string, a matching object is returned, otherwise None
is returned. 5. rx.sub(x, s, m):
Returns a string. Replace each matching position with x and return the replaced string. If m is specified, it will be replaced up to m times. For x, you can use /i or /g id can be a group name or number to reference the captured content.
　　　x in the module method re.sub(r, x, s, m) can use a function. At this time, we can push the captured content through this function for processing and then replace the matched text.
6. rx.subn(x, s, m):
Same as re.sub() method, the difference is that it returns a tuple, one of which is the result string and one is for replacement number.
7. rx.split(s, m): split the string
Return a list
Use the content matched by the regular expression to split the string
If there are groups in the regular expression, Then put the content matched by the group in the middle of each two divisions in the list as part of the list, such as:
　　rx = re.compile(r"(\d)[a-z]+(\d)")
　　s = "ab12dk3klj8jk9jks5"
　　　result = rx.split(s)
　　　 Return ['ab1', '2', '3', 'klj', '8', '9', 'jks5' ]
8. rx.flags(): Flags set when compiling regular expressions
9. rx.pattern(): String used when compiling regular expressions

2.4 Attributes and methods of matching objects

　　01. m.group(g, ...)
　　　 Returns the content matched by the number or group name. The default or 0 indicates that the entire expression matches Content, if multiple are specified, a tuple will be returned
　　 02. m.groupdict(default)
　　 Return a dictionary. The keys of the dictionary are the group names of all named groups, and the values are the contents captured by the named groups. If there is a default parameter, it will be used as the default value for those groups that do not participate in the matching.
03. m.groups(default)
Returns a tuple. Contains all subgroups that capture content, starting from 1. If a default value is specified, this value is used as the value of the group that did not capture the content.
　　04. m.lastgroup()
　　 The number of the matched content The name of the highest capturing group. If there is no or no name used, None is returned (uncommonly used)
　　05. m.lastindex()
　　　The number of the highest-numbered capturing group that matches the content, if not, None is returned .
06. m.start(g):
The subgroup of the current matching object is matched from that position in the string. If the current group does not participate in the match, -1
is returned. 07. m.end (g)
　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　 around forward through from that position in the string. If the current group does not participate in the match, -1
　　08. m.span()
　　　 Returns a binary Group, the content is the return value of m.start(g) and m.end(g)
　　09. m.re()
　　 The regular expression that generates this matching object
　　 10. m. string()
The string passed to match or search for matching
11. m.pos()
The starting position of the search. That is, the beginning of the string, or the position specified by start (not commonly used)
　　12. m.endpos()
　　　The end position of the search. That is, the end position of the string, or the position specified by end (not commonly used)

　2.5 Summary

1. For the regular expression matching function, Python does not have a method to return true and false, but it can be judged by whether the return value of the match or search method is None
2. For the regular expression search function , if you only search once, you can use the matching object returned by the search or match method. For multiple searches, you can use the iterable object returned by the finditer method to iteratively access
3. For the replacement function of regular expressions, you can use regular expressions It can be implemented by the sub or subn method of the formula object, or by the re module method sub or subn. The difference is that the replacement text of the sub method of the module can be generated using a function. 4. For the regular expression segmentation function, You can use the split method of the regular expression object. It should be noted that if the regular expression object is grouped, the content captured by the group will also be placed in the returned list

The above is the detailed content of Knowledge summary and sharing of regular expressions in Python. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Assassin's Creed Shadows: Seashell Riddle Solution

3 weeks ago By DDD

What's New in Windows 11 KB5054979 & How to Fix Update Issues

2 weeks ago By DDD

Where to find the Crane Control Keycard in Atomfall

3 weeks ago By DDD

Saving in R.E.P.O. Explained (And Save Files)

1 months ago By 尊渡假赌尊渡假赌尊渡假赌

Assassin's Creed Shadows - How To Find The Blacksmith And Unlock Weapon And Armour Customisation

4 weeks ago By DDD

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Where is the login entrance for gmail email?

7575

CakePHP Tutorial

1386

What is the format of the account name of steam

win11 activation key permanent

nyt connections hints and answers

110

Related knowledge

Can visual studio code be used in python Apr 15, 2025 pm 08:18 PM

VS Code can be used to write Python and provides many features that make it an ideal tool for developing Python applications. It allows users to: install Python extensions to get functions such as code completion, syntax highlighting, and debugging. Use the debugger to track code step by step, find and fix errors. Integrate Git for version control. Use code formatting tools to maintain code consistency. Use the Linting tool to spot potential problems ahead of time.

How to run programs in terminal vscode Apr 15, 2025 pm 06:42 PM

In VS Code, you can run the program in the terminal through the following steps: Prepare the code and open the integrated terminal to ensure that the code directory is consistent with the terminal working directory. Select the run command according to the programming language (such as Python's python your_file_name.py) to check whether it runs successfully and resolve errors. Use the debugger to improve debugging efficiency.

Can vs code run in Windows 8 Apr 15, 2025 pm 07:24 PM

VS Code can run on Windows 8, but the experience may not be great. First make sure the system has been updated to the latest patch, then download the VS Code installation package that matches the system architecture and install it as prompted. After installation, be aware that some extensions may be incompatible with Windows 8 and need to look for alternative extensions or use newer Windows systems in a virtual machine. Install the necessary extensions to check whether they work properly. Although VS Code is feasible on Windows 8, it is recommended to upgrade to a newer Windows system for a better development experience and security.

Is the vscode extension malicious? Apr 15, 2025 pm 07:57 PM

VS Code extensions pose malicious risks, such as hiding malicious code, exploiting vulnerabilities, and masturbating as legitimate extensions. Methods to identify malicious extensions include: checking publishers, reading comments, checking code, and installing with caution. Security measures also include: security awareness, good habits, regular updates and antivirus software.

What is vscode What is vscode for? Apr 15, 2025 pm 06:45 PM

VS Code is the full name Visual Studio Code, which is a free and open source cross-platform code editor and development environment developed by Microsoft. It supports a wide range of programming languages and provides syntax highlighting, code automatic completion, code snippets and smart prompts to improve development efficiency. Through a rich extension ecosystem, users can add extensions to specific needs and languages, such as debuggers, code formatting tools, and Git integrations. VS Code also includes an intuitive debugger that helps quickly find and resolve bugs in your code.

Python: Automation, Scripting, and Task Management Apr 16, 2025 am 12:14 AM

Python excels in automation, scripting, and task management. 1) Automation: File backup is realized through standard libraries such as os and shutil. 2) Script writing: Use the psutil library to monitor system resources. 3) Task management: Use the schedule library to schedule tasks. Python's ease of use and rich library support makes it the preferred tool in these areas.

Can visual studio code run python Apr 15, 2025 pm 08:00 PM

VS Code not only can run Python, but also provides powerful functions, including: automatically identifying Python files after installing Python extensions, providing functions such as code completion, syntax highlighting, and debugging. Relying on the installed Python environment, extensions act as bridge connection editing and Python environment. The debugging functions include setting breakpoints, step-by-step debugging, viewing variable values, and improving debugging efficiency. The integrated terminal supports running complex commands such as unit testing and package management. Supports extended configuration and enhances features such as code formatting, analysis and version control.

Can vs code run python Apr 15, 2025 pm 08:21 PM

Yes, VS Code can run Python code. To run Python efficiently in VS Code, complete the following steps: Install the Python interpreter and configure environment variables. Install the Python extension in VS Code. Run Python code in VS Code's terminal via the command line. Use VS Code's debugging capabilities and code formatting to improve development efficiency. Adopt good programming habits and use performance analysis tools to optimize code performance.

See all articles