


Detailed explanation of what regular expressions are and their usage
1. What is a regular expression?
Regular expression (regular expression) describes a string matching pattern, which can be used to: contain Matches a certain
(1) Check whether a string contains a string that matches a certain rule, and the string can be obtained;
(2) Flexibly perform string processing based on matching rules replacement operation.
Regular expressions are actually very simple to learn, and a few more abstract concepts are also easy to understand. The reason why many people feel that regular expressions are complicated is that, on the one hand, most documents do not explain them from the shallower to the deeper, and do not pay attention to the order of concepts, which makes it difficult to understand; on the other hand, various engines The documentation that comes with it usually introduces its unique functions, but these unique functions are not the first thing we need to understand.
Related courses: Boolean education regular expression video tutorial
##2 .How to use regular expressions
2.1 Ordinary characters
Letters, numbers, Chinese characters, underscores, As well as punctuation marks that are not specially defined in the following chapters, they are all ordinary characters. Ordinary characters in an expression, when matching a string, match the same character. Example 1: Expression c, when matching the string abcdef, the matching result is: success; the matched content is: c; the matched position is: starting at 2 and ending at 3. (Note: Whether the subscript starts from 0 or 1 may differ depending on the current programming language). Example 2: Expression bcd, when matching the string abcde, the matching result is: success; the matched content is: bcd; the matched position is: starting at 1 and ending at 4.2.2 Simple escape characters
For some characters that are inconvenient to write, use the method of adding \ in front. In fact, we are all familiar with these characters.2.3 Expressions that can match 'multiple characters'
Some expression methods in regular expressions can match multiple any one of these characters. For example, the expression \d can match any number. Although it can match any of the characters, it can only be one, not multiple. This is just like when playing poker, the king can replace any card, but the jackpot can replace one card.2.4 Custom expressions that can match 'multiple characters'
Use square brackets [] to include a series of characters that can match them any character. Use [^] to include a series of characters, and it can match any character except the characters among them. In the same way, although any one of them can be matched, it can only be one, not multiple.2.5 Special symbols that modify the number of matches
The expressions mentioned in the previous chapter, whether they are expressions that can only match one type of character or expressions that can match multiple characters, can only be matched once. If you use an expression plus a special symbol that modifies the number of matches, you can match repeatedly without writing the expression again.
The usage method is: put the "number of times modification" after the modified expression. For example: [bcd][bcd] can be written as [bcd]{2}.
Example 1: When the expression \d+/.?\d* matches it costs $12.5 , the matching result is: success; the matched content is: 12.5 ; The matched positions are: starting at 10 and ending at 14.
Example 2: When the expression go{2, 8}gle matches Ads by goooooogle, the matching result is: success; the matched content is: goooooogle; the matched position is: starting at 7, Ended at 17.
2.6 Some other symbols representing abstract meanings
Some symbols represent abstract special meanings in expressions:
Further text explanation is still relatively abstract, so examples are given to help everyone understand.
Example 1: When the expression ^aaa matches xxx aaa xxx, the matching result is: failure. Because ^ is required to match the beginning of the string, ^aaa can only match when aaa is at the beginning of the string, such as: aaa xxx xxx.
Example 2: When the expression aaa$ matches xxx aaa xxx, the matching result is: failure. Because $ is required to match the end of the string, aaa$ can only match when aaa is at the end of the string, such as: xxx xxx aaa.
Example 3: Expression .\b. When matching @@@abc, the matching result is: success; the matched content is: @a; the matched position is: starting at 2 and ending at 4.
Further explanation: \b is similar to ^ and $. It does not match any character itself, but it requires it to be on both sides of the position in the matching result. One side is the \w range and the other side is the non-\w range. .
Example 4: When the expression \bend\b matches weekend, endfor, end, the matching result is: success; the matched content is: end; the matched position is: starting at 15 and ending at 18.
Some symbols can affect the relationship between subexpressions within an expression:
Example 5: The expression Tom|Jack matches the string I' m Tom,he is Jack, the matching result is: success; the matched content is: Tom; the matched position is: starting at 4 and ending at 7. When matching the next one, the matching result is: success; the matched The content is: Jack; the matched position is: starting at 15 and ending at 19.
Example 6: When the expression (go\s*)+ matches Let's go go go!, the matching result is: success; the matched content is: go go go; the matched position is: start On 6, ended on 14.
Example 7: When the expression ¥(\d+\.?\d) matches $10.9,¥20.5, the matching result is: success; the matched content is: ¥20.5; the matched position is : Starts at 6 and ends at 10. The content matched by obtaining the bracket range alone is: 20.5.
3. Some advanced usage of regular expressions
3.1 Greedy and non-greedy in the number of matches
Greedy mode:
When using modified matching times When using special symbols, there are several representation methods that can enable the same expression to match different times at the same time, such as: "{m, n}", "{m,}", ?, *, +, the specific number of matches depends on Depends on the matching string. This kind of repeated matching expression an indefinite number of times always matches as many times as possible during the matching process. For example, for the text dxxxdxxxd, the example is as follows:
It can be seen that when matching, \w+ always matches as many characters as possible that meet its rules. Although in the second example, it does not match the last d, it is also to make the entire expression match successfully. In the same way, expressions with * and "{m, n}" are matched as much as possible, and expressions with ? are also "matched" as much as possible, depending on whether they can match or not. This matching principle is called greedy mode.
Non-greedy mode:
Add the ? sign after the special symbol that modifies the number of matches, so that expressions with an indefinite number of matches can be matched as little as possible, and expressions that can be matched or not matched can be "unmatched" as much as possible. This matching principle is called non-greedy mode, also called reluctant mode. If there are fewer matches, the entire regular expression will fail to match. Similar to the greedy mode, the non-greedy mode will minimally match more to make the entire regular expression match successfully. For example, for the text "dxxxdxxxd":
aa
bb
aa
bb
aa
bb
3.2 Backreference\1,\2...
When the expression is matched, the expression engine will include parentheses () The string matched by the expression is recorded. When obtaining the matching result, the string matched by the expression contained in parentheses can be fired separately. This has been demonstrated many times in the previous examples. In practical applications, when a certain boundary is used to search and the content to be obtained does not include the boundary, parentheses must be used to specify the desired range. For example, the previous##3.3 Preliminary. Search, no match; reverse pre-search, no matchIn the previous chapter, I talked about several special symbols that represent abstract meanings: ^, $, \b. One thing they have in common is that they do not match any characters themselves, but only add a condition to the "two ends of the string" or the "gap between characters". After understanding this concept, this section will continue to introduce another one. A more flexible method that adds conditions to "both ends" or "gaps"
Forward pre-search: (?=xxxxx), (?!xxxxx) Format: (?=xxxxx), in the matched string, the "gap" or "both ends" it is located in. The additional condition is: the right side of the gap must be able to match the expression of "xxxxx" . Because it is only used as an additional condition on this gap, it does not affect the subsequent expressions to actually match the characters after this gap. This is similar to \b , which does not match any characters by itself. \b just takes the characters before and after the gap and makes a judgment. It will not affect the subsequent expressions to actually match. Example 1: When the expression Windows(?=NT|XP) matches Windows 98, Windows NT, and Windows 2000, it will only match Windows in Windows NT, and other Windows words will not be matched. Example 2: The expression (\w)((?=\1\1\1)(\1))+ will match the first 4 of 6 f when matching the string aaa ffffff 9999999999 , can match 9 9 and the first 7. This expression can be interpreted as: if letters and numbers are repeated more than 4 times, the part before the last 2 digits will be matched. Of course, this expression does not need to be written like this, but it is only used for demonstration purposes. Format: (?!xxxxx) , located on the right side of the gap, must not match the xxxxx part of the expression. Example 3: When the expression ((?!\bstop\b).)+ matches fdjka ljfdl stop fjdsla fdj, it will match from the beginning to the position before stop. If there is no stop in the string, then Matches the entire string. Example 4: When the expression do(?!\w) matches the string done, do, dog, it can only match do. In this example, using (?!\w) after do has the same effect as using \b. Reverse pre-search: (? The concepts of these two formats are similar to forward pre-search , the condition required for reverse pre-search is: the "left side" of the gap. The two formats respectively require that it must be able to match and must not be able to match the specified expression, instead of judging the right side. The same as "forward pre-search" in that they are an addition to the gap and do not match any characters themselves.
4. Other general rules 4.1 Rule 1 In expressions, you can use \xXX and \uXXXX to represent a character (X represents a hexadecimal number) 4.2 Rule 2 While the expressions \s, \d, \w, \b represent special meanings, the corresponding Capital letters indicate the opposite meaning 4.3 Rule 3 has special meaning in expressions, Summary of characters that need to add \ to match the character itself 4.4 Rule 4 Brackets () If you want the matching results not to be recorded for later use, you can use the (?:xxxxx) format. Example 1: When the expression (?:(\w)\1)+ matches "a bbccdd efg", the result is "bbccdd". Matches within the bracket (?:) range are not logged, so (\w) is quoted using \1. 4.5 Rule 5 Introduction to commonly used expression attribute settings: Ignorecase, Singleline, Multiline, Global Related articles: How to use regular expressions to match parentheses in PHP Summary on the use of common functions in PHP regular expressions Simple code example of php regular expression matching Chinese characters

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics

PHP regular expression verification: Number format detection When writing PHP programs, it is often necessary to verify the data entered by the user. One of the common verifications is to check whether the data conforms to the specified number format. In PHP, you can use regular expressions to achieve this kind of validation. This article will introduce how to use PHP regular expressions to verify number formats and provide specific code examples. First, let’s look at common number format validation requirements: Integers: only contain numbers 0-9, can start with a plus or minus sign, and do not contain decimal points. floating point

To validate email addresses in Golang using regular expressions, follow these steps: Use regexp.MustCompile to create a regular expression pattern that matches valid email address formats. Use the MatchString function to check whether a string matches a pattern. This pattern covers most valid email address formats, including: Local usernames can contain letters, numbers, and special characters: !.#$%&'*+/=?^_{|}~-`Domain names must contain at least One letter, followed by letters, numbers, or hyphens. The top-level domain (TLD) cannot be longer than 63 characters.

PHP Regular Expressions: Exact Matching and Exclusion Fuzzy inclusion regular expressions are a powerful text matching tool that can help programmers perform efficient search, replacement and filtering when processing text. In PHP, regular expressions are also widely used in string processing and data matching. This article will focus on how to perform exact matching and exclude fuzzy inclusion operations in PHP, and will illustrate it with specific code examples. Exact match Exact match means matching only strings that meet the exact condition, not any variations or extra words.

As a modern programming language, Go language provides powerful regular expressions and string processing functions, allowing developers to process string data more efficiently. It is very important for developers to master regular expressions and string processing in Go language. This article will introduce in detail the basic concepts and usage of regular expressions in Go language, and how to use Go language to process strings. 1. Regular expressions Regular expressions are a tool used to describe string patterns. They can easily implement operations such as string matching, search, and replacement.

In Go, you can use regular expressions to match timestamps: compile a regular expression string, such as the one used to match ISO8601 timestamps: ^\d{4}-\d{2}-\d{2}T \d{2}:\d{2}:\d{2}(\.\d+)?(Z|[+-][0-9]{2}:[0-9]{2})$ . Use the regexp.MatchString function to check if a string matches a regular expression.

The method of using regular expressions to verify passwords in Go is as follows: Define a regular expression pattern that meets the minimum password requirements: at least 8 characters, including lowercase letters, uppercase letters, numbers, and special characters. Compile regular expression patterns using the MustCompile function from the regexp package. Use the MatchString method to test whether the input string matches a regular expression pattern.

The steps to detect URLs in Golang using regular expressions are as follows: Compile the regular expression pattern using regexp.MustCompile(pattern). Pattern needs to match protocol, hostname, port (optional), path (optional) and query parameters (optional). Use regexp.MatchString(pattern,url) to detect whether the URL matches the pattern.

Regular expression wildcards include ".", "*", "+", "?", "^", "$", "[]", "[^]", "[a-z]", "[A-Z] ","[0-9]","\d","\D","\w","\W","\s&quo
