Home Backend Development PHP Tutorial Detailed explanation of what regular expressions are and their usage

Detailed explanation of what regular expressions are and their usage

Mar 28, 2017 pm 02:54 PM
regular expression

1. What is a regular expression?

Regular expression (regular expression) describes a string matching pattern, which can be used to: contain Matches a certain

(1) Check whether a string contains a string that matches a certain rule, and the string can be obtained;

(2) Flexibly perform string processing based on matching rules replacement operation.

Regular expressions are actually very simple to learn, and a few more abstract concepts are also easy to understand. The reason why many people feel that regular expressions are complicated is that, on the one hand, most documents do not explain them from the shallower to the deeper, and do not pay attention to the order of concepts, which makes it difficult to understand; on the other hand, various engines The documentation that comes with it usually introduces its unique functions, but these unique functions are not the first thing we need to understand.

Related courses: Boolean education regular expression video tutorial


##2 .How to use regular expressions

2.1 Ordinary characters

Letters, numbers, Chinese characters, underscores, As well as punctuation marks that are not specially defined in the following chapters, they are all ordinary characters. Ordinary characters in an expression, when matching a string, match the same character.

Example 1: Expression c, when matching the string abcdef, the matching result is: success; the matched content is: c; the matched position is: starting at 2 and ending at 3. (Note: Whether the subscript starts from 0 or 1 may differ depending on the current programming language).

Example 2: Expression bcd, when matching the string abcde, the matching result is: success; the matched content is: bcd; the matched position is: starting at 1 and ending at 4.

2.2 Simple escape characters

For some characters that are inconvenient to write, use the method of adding \ in front. In fact, we are all familiar with these characters.

Detailed explanation of what regular expressions are and their usage

There are other punctuation marks that have special uses in later chapters. Add \ in front to represent the symbol itself. For example: ^ and $ have special meanings. If you want to hide the ^ and $ characters in the string, the regular expressions need to be written as \^ and \$.

Detailed explanation of what regular expressions are and their usage

The matching method of these escape characters is similar to that of ordinary characters. Also matches the same character.

Example: Expression \$d, when matching the string abc$de, the matching result is: success; the matched content is: $d; the matched position is: starting at 3 and ending at 5.

2.3 Expressions that can match 'multiple characters'

Some expression methods in regular expressions can match multiple any one of these characters. For example, the expression \d can match any number. Although it can match any of the characters, it can only be one, not multiple. This is just like when playing poker, the king can replace any card, but the jackpot can replace one card.

Detailed explanation of what regular expressions are and their usage

Example 1: Expression \d\d, when matching abc123, the matching result is: success; the matched content is: 12; the matched position is: Starts at 3 and ends at 5.

Example 2: Expression a.\d, when matching aaa100, the matching result is: success; the matched content is: aa1; the matched position is: starting at 1, ended in 4.

2.4 Custom expressions that can match 'multiple characters'

Use square brackets [] to include a series of characters that can match them any character. Use [^] to include a series of characters, and it can match any character except the characters among them. In the same way, although any one of them can be matched, it can only be one, not multiple.

Detailed explanation of what regular expressions are and their usage

Example 1: When the expression [bcd][bcd] matches abc123, the matching result is: success; the matched content is: bc; the matched position is : Starts at 1 and ends at 3.

Example 2: When the expression [^abc] matches abc123, the matching result is: success; the matched content is: 1; the matched position is: starting at 3 and ending at 4.

2.5 Special symbols that modify the number of matches

The expressions mentioned in the previous chapter, whether they are expressions that can only match one type of character or expressions that can match multiple characters, can only be matched once. If you use an expression plus a special symbol that modifies the number of matches, you can match repeatedly without writing the expression again.

The usage method is: put the "number of times modification" after the modified expression. For example: [bcd][bcd] can be written as [bcd]{2}.

Detailed explanation of what regular expressions are and their usage

Example 1: When the expression \d+/.?\d* matches it costs $12.5 , the matching result is: success; the matched content is: 12.5 ; The matched positions are: starting at 10 and ending at 14.

Example 2: When the expression go{2, 8}gle matches Ads by goooooogle, the matching result is: success; the matched content is: goooooogle; the matched position is: starting at 7, Ended at 17.

2.6 Some other symbols representing abstract meanings

Some symbols represent abstract special meanings in expressions:

Detailed explanation of what regular expressions are and their usage

Further text explanation is still relatively abstract, so examples are given to help everyone understand.

Example 1: When the expression ^aaa matches xxx aaa xxx, the matching result is: failure. Because ^ is required to match the beginning of the string, ^aaa can only match when aaa is at the beginning of the string, such as: aaa xxx xxx.

Example 2: When the expression aaa$ matches xxx aaa xxx, the matching result is: failure. Because $ is required to match the end of the string, aaa$ can only match when aaa is at the end of the string, such as: xxx xxx aaa.

Example 3: Expression .\b. When matching @@@abc, the matching result is: success; the matched content is: @a; the matched position is: starting at 2 and ending at 4.

Further explanation: \b is similar to ^ and $. It does not match any character itself, but it requires it to be on both sides of the position in the matching result. One side is the \w range and the other side is the non-\w range. .

Example 4: When the expression \bend\b matches weekend, endfor, end, the matching result is: success; the matched content is: end; the matched position is: starting at 15 and ending at 18.

Some symbols can affect the relationship between subexpressions within an expression:

Detailed explanation of what regular expressions are and their usage

Example 5: The expression Tom|Jack matches the string I' m Tom,he is Jack, the matching result is: success; the matched content is: Tom; the matched position is: starting at 4 and ending at 7. When matching the next one, the matching result is: success; the matched The content is: Jack; the matched position is: starting at 15 and ending at 19.

Example 6: When the expression (go\s*)+ matches Let's go go go!, the matching result is: success; the matched content is: go go go; the matched position is: start On 6, ended on 14.

Example 7: When the expression ¥(\d+\.?\d) matches $10.9,¥20.5, the matching result is: success; the matched content is: ¥20.5; the matched position is : Starts at 6 and ends at 10. The content matched by obtaining the bracket range alone is: 20.5.


3. Some advanced usage of regular expressions

3.1 Greedy and non-greedy in the number of matches

Greedy mode:

When using modified matching times When using special symbols, there are several representation methods that can enable the same expression to match different times at the same time, such as: "{m, n}", "{m,}", ?, *, +, the specific number of matches depends on Depends on the matching string. This kind of repeated matching expression an indefinite number of times always matches as many times as possible during the matching process. For example, for the text dxxxdxxxd, the example is as follows:

Detailed explanation of what regular expressions are and their usage

It can be seen that when matching, \w+ always matches as many characters as possible that meet its rules. Although in the second example, it does not match the last d, it is also to make the entire expression match successfully. In the same way, expressions with * and "{m, n}" are matched as much as possible, and expressions with ? are also "matched" as much as possible, depending on whether they can match or not. This matching principle is called greedy mode.

Non-greedy mode:

Add the ? sign after the special symbol that modifies the number of matches, so that expressions with an indefinite number of matches can be matched as little as possible, and expressions that can be matched or not matched can be "unmatched" as much as possible. This matching principle is called non-greedy mode, also called reluctant mode. If there are fewer matches, the entire regular expression will fail to match. Similar to the greedy mode, the non-greedy mode will minimally match more to make the entire regular expression match successfully. For example, for the text "dxxxdxxxd":

Detailed explanation of what regular expressions are and their usage

##For more situations, examples are as follows:

Example 1: Expression (. *) matches the string

aa

bb

The result is: success; the matched content is:

aa

bb

the entire string , the in the expression will match the last in the string.

Example 2: In contrast, if the expression (.*) matches the same string in example 1, only

aa

, when matching the next one again, you can get the second

bb

.

3.2 Backreference\1,\2...

When the expression is matched, the expression engine will include parentheses () The string matched by the expression is recorded. When obtaining the matching result, the string matched by the expression contained in parentheses can be fired separately. This has been demonstrated many times in the previous examples. In practical applications, when a certain boundary is used to search and the content to be obtained does not include the boundary, parentheses must be used to specify the desired range. For example, the previous (.*?) .

In fact, "the string matched by the expression contained in parentheses" can not only be used after the matching is completed, but can also be used during the matching process. The part after the expression can refer to the previous "submatch in parentheses that has already matched the string". The reference method is \ plus a number. \1 refers to the string matched in the first pair of brackets, \2 refers to the string matched in the second pair of brackets... and so on. If a pair of brackets contains another pair of brackets, the outer brackets are sorted first. Number. In other words, which pair of left parentheses ( comes first, then this pair will be sorted first.

Example 1: The expression ('|")(.*?)(/1) is matching 'Hello', "World", the matching result is: success; the matched content is: 'Hello'. When matching the next one, it can match "World"

Example 2: Expression. (\w)\1{4,} When matching aa bbbb abcdefg ccccc 111121111 999999999, the matching result is: success; the matched content is: cccccc. When matching the next one, you will get 999999999. This expression requires \w. The characters in the range are repeated at least 5 times. Pay attention to the difference with \w{5,}

Example 3: Expression .*?/1> When matching , The matching result is: success. If and are not matched, the matching will fail; if it is changed to another pairing, the matching can also be successful.

##3.3 Preliminary. Search, no match; reverse pre-search, no matchIn the previous chapter, I talked about several special symbols that represent abstract meanings: ^, $, \b. One thing they have in common is that they do not match any characters themselves, but only add a condition to the "two ends of the string" or the "gap between characters". After understanding this concept, this section will continue to introduce another one. A more flexible method that adds conditions to "both ends" or "gaps"

Forward pre-search

: (?=xxxxx), (?!xxxxx)

Format: (?=xxxxx), in the matched string, the "gap" or "both ends" it is located in. The additional condition is: the right side of the gap must be able to match the expression of "xxxxx" . Because it is only used as an additional condition on this gap, it does not affect the subsequent expressions to actually match the characters after this gap. This is similar to \b , which does not match any characters by itself. \b just takes the characters before and after the gap and makes a judgment. It will not affect the subsequent expressions to actually match.

Example 1: When the expression Windows(?=NT|XP) matches Windows 98, Windows NT, and Windows 2000, it will only match Windows in Windows NT, and other Windows words will not be matched.

Example 2: The expression (\w)((?=\1\1\1)(\1))+ will match the first 4 of 6 f when matching the string aaa ffffff 9999999999 , can match 9 9 and the first 7. This expression can be interpreted as: if letters and numbers are repeated more than 4 times, the part before the last 2 digits will be matched. Of course, this expression does not need to be written like this, but it is only used for demonstration purposes.

Format: (?!xxxxx) , located on the right side of the gap, must not match the xxxxx part of the expression.

Example 3: When the expression ((?!\bstop\b).)+ matches fdjka ljfdl stop fjdsla fdj, it will match from the beginning to the position before stop. If there is no stop in the string, then Matches the entire string.

Example 4: When the expression do(?!\w) matches the string done, do, dog, it can only match do. In this example, using (?!\w) after do has the same effect as using \b.

Reverse pre-search: (?

The concepts of these two formats are similar to forward pre-search , the condition required for reverse pre-search is: the "left side" of the gap. The two formats respectively require that it must be able to match and must not be able to match the specified expression, instead of judging the right side. The same as "forward pre-search" in that they are an addition to the gap and do not match any characters themselves.


4. Other general rules

4.1 Rule 1

In expressions, you can use \xXX and \uXXXX to represent a character (X represents a hexadecimal number)

Detailed explanation of what regular expressions are and their usage

4.2 Rule 2

While the expressions \s, \d, \w, \b represent special meanings, the corresponding Capital letters indicate the opposite meaning

1Detailed explanation of what regular expressions are and their usage

4.3 Rule 3

has special meaning in expressions, Summary of characters that need to add \ to match the character itself

1Detailed explanation of what regular expressions are and their usage

4.4 Rule 4

Brackets () If you want the matching results not to be recorded for later use, you can use the (?:xxxxx) format.

Example 1: When the expression (?:(\w)\1)+ matches "a bbccdd efg", the result is "bbccdd". Matches within the bracket (?:) range are not logged, so (\w) is quoted using \1.

4.5 Rule 5

Introduction to commonly used expression attribute settings: Ignorecase, Singleline, Multiline, Global

1Detailed explanation of what regular expressions are and their usage

Related articles:

How to use regular expressions to match parentheses in PHP

Summary on the use of common functions in PHP regular expressions

Simple code example of php regular expression matching Chinese characters

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
2 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
Hello Kitty Island Adventure: How To Get Giant Seeds
1 months ago By 尊渡假赌尊渡假赌尊渡假赌
Two Point Museum: All Exhibits And Where To Find Them
1 months ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

PHP regular expression validation: number format detection PHP regular expression validation: number format detection Mar 21, 2024 am 09:45 AM

PHP regular expression verification: Number format detection When writing PHP programs, it is often necessary to verify the data entered by the user. One of the common verifications is to check whether the data conforms to the specified number format. In PHP, you can use regular expressions to achieve this kind of validation. This article will introduce how to use PHP regular expressions to verify number formats and provide specific code examples. First, let’s look at common number format validation requirements: Integers: only contain numbers 0-9, can start with a plus or minus sign, and do not contain decimal points. floating point

How to validate email address in Golang using regular expression? How to validate email address in Golang using regular expression? May 31, 2024 pm 01:04 PM

To validate email addresses in Golang using regular expressions, follow these steps: Use regexp.MustCompile to create a regular expression pattern that matches valid email address formats. Use the MatchString function to check whether a string matches a pattern. This pattern covers most valid email address formats, including: Local usernames can contain letters, numbers, and special characters: !.#$%&'*+/=?^_{|}~-`Domain names must contain at least One letter, followed by letters, numbers, or hyphens. The top-level domain (TLD) cannot be longer than 63 characters.

PHP regular expressions: exact matching and exclusion of fuzzy inclusions PHP regular expressions: exact matching and exclusion of fuzzy inclusions Feb 28, 2024 pm 01:03 PM

PHP Regular Expressions: Exact Matching and Exclusion Fuzzy inclusion regular expressions are a powerful text matching tool that can help programmers perform efficient search, replacement and filtering when processing text. In PHP, regular expressions are also widely used in string processing and data matching. This article will focus on how to perform exact matching and exclude fuzzy inclusion operations in PHP, and will illustrate it with specific code examples. Exact match Exact match means matching only strings that meet the exact condition, not any variations or extra words.

Master regular expressions and string processing in Go language Master regular expressions and string processing in Go language Nov 30, 2023 am 09:54 AM

As a modern programming language, Go language provides powerful regular expressions and string processing functions, allowing developers to process string data more efficiently. It is very important for developers to master regular expressions and string processing in Go language. This article will introduce in detail the basic concepts and usage of regular expressions in Go language, and how to use Go language to process strings. 1. Regular expressions Regular expressions are a tool used to describe string patterns. They can easily implement operations such as string matching, search, and replacement.

How to match timestamps using regular expressions in Go? How to match timestamps using regular expressions in Go? Jun 02, 2024 am 09:00 AM

In Go, you can use regular expressions to match timestamps: compile a regular expression string, such as the one used to match ISO8601 timestamps: ^\d{4}-\d{2}-\d{2}T \d{2}:\d{2}:\d{2}(\.\d+)?(Z|[+-][0-9]{2}:[0-9]{2})$ . Use the regexp.MatchString function to check if a string matches a regular expression.

How to verify password using regular expression in Go? How to verify password using regular expression in Go? Jun 02, 2024 pm 07:31 PM

The method of using regular expressions to verify passwords in Go is as follows: Define a regular expression pattern that meets the minimum password requirements: at least 8 characters, including lowercase letters, uppercase letters, numbers, and special characters. Compile regular expression patterns using the MustCompile function from the regexp package. Use the MatchString method to test whether the input string matches a regular expression pattern.

How to detect URL with regular expression in Golang? How to detect URL with regular expression in Golang? May 31, 2024 am 10:32 AM

The steps to detect URLs in Golang using regular expressions are as follows: Compile the regular expression pattern using regexp.MustCompile(pattern). Pattern needs to match protocol, hostname, port (optional), path (optional) and query parameters (optional). Use regexp.MatchString(pattern,url) to detect whether the URL matches the pattern.

What are the regular expression wildcards? What are the regular expression wildcards? Nov 17, 2023 pm 01:40 PM

Regular expression wildcards include ".", "*", "+", "?", "^", "$", "[]", "[^]", "[a-z]", "[A-Z] ","[0-9]","\d","\D","\w","\W","\s&quo

See all articles