1. Introduction
At present, regular expressions have been widely used in many software, including *nix (Linux, Unix, etc.), operating systems such as HP, development environments such as PHP, C#, Java, and many application software. Regular expression shadow.
The use of regular expressions can achieve powerful functions in a simple way. In order to be simple and effective yet powerful, the regular expression code is more difficult and not easy to learn, so it requires some effort. After getting started, it is relatively simple and effective to use it by referring to certain references.
Example: ^.+@.+\..+$
Such code has scared me away many times. Maybe many people are scared away by such code. Continuing reading this article will give you the freedom to apply code like this too.
2. History of regular expressions
The “ancestors” of regular expressions can be traced all the way back to early research on how the human nervous system works. Two neurophysiologists, Warren McCulloch and Walter Pitts, developed a mathematical way to describe these neural networks.
In 1956, a mathematician named Stephen Kleene published a paper titled "Representation of Neural Network Events" based on the early work of McCulloch and Pitts, introducing the concept of regular expressions. Regular expressions are used to describe expressions that he calls "the algebra of regular sets," hence the term "regular expression."
It was later discovered that this work could be applied to some early research using the computational search algorithms of Ken Thompson, the primary inventor of Unix. The first practical application of regular expressions was the qed editor in Unix.
The rest, as they say, is history as we all know it. Regular expressions have been an important part of text-based editors and search tools ever since.
3. Regular expression definition
Regular expression (regular expression) describes a string matching pattern, which can be used to check whether a string contains a certain substring, replace the matching substring, or extract a substring that meets a certain condition from a certain string. wait.
Regular expressions are text patterns composed of ordinary characters (such as the characters a through z) and special characters (called metacharacters). A regular expression acts as a template that matches a character pattern with a searched string.
3.1 Common characters
Consists of all those printing and non-printing characters that are not explicitly designated as metacharacters. This includes all uppercase and lowercase alphabetic characters, all numbers, all punctuation, and some symbols.
3.2 Non-printing characters
字符 | 含义 |
cx | 匹配由x指明的控制字符。例如,cM 匹配一个Control-M 或回车符。x 的值必须为A-Z 或a-z 之一。否则,将c 视为一个原义的'c' 字符。 |
f | 匹配一个换页符。等价于x0c 和cL。 |
n | 匹配一个换行符。等价于x0a 和cJ。 |
r | 匹配一个回车符。等价于x0d 和cM。 |
s | 匹配任何空白字符,包括空格、制表符、换页符等等。等价于[ fnrtv]。 |
S | 匹配任何非空白字符。等价于[^ fnrtv]。 |
t | 匹配一个制表符。等价于x09 和cI。 |
v | 匹配一个垂直制表符。等价于x0b 和cK。 |
3.3 Special characters
The so-called special characters are characters with special meanings. For example, the * in "*.txt" mentioned above simply means the meaning of any string. If you want to find files with * in the file name, you need to escape the *, that is, add one in front of it. ls*.txt. Regular expressions have the following special characters.
字符 | 说明 |
$ | 匹配输入字符串的结尾位置如果设置了RegExp 对象的Multiline 属性,则$ 也匹配'n' 或'r'要匹配$ 字符本身,请使用$ |
() | 标记一个子表达式的开始和结束位置子表达式可以获取供以后使用要匹配这些字符,请使用( 和) |
* | 匹配前面的子表达式零次或多次要匹配* 字符,请使用* |
+ | 匹配前面的子表达式一次或多次要匹配+ 字符,请使用+ |
. | 匹配除换行符n之外的任何单字符要匹配.,请使用. |
[ | 标记一个中括号表达式的开始。要匹配[,请使用[ |
? | 匹配前面的子表达式零次或一次,或指明一个非贪婪限定符。要匹配? 字符,请使用? |
将下一个字符标记为或特殊字符、或原义字符、或向后引用、或八进制转义符。 | |
^ | 匹配输入字符串的开始位置,除非在方括号表达式中使用,此时它表示不接受该字符集合。要匹配^ 字符本身,请使用^ |
{ | 标记限定符表达式的开始。要匹配{,请使用{ |
| | 指明两项之间的一个选择。要匹配|,请使用| |
3.4 Qualifier
Qualifiers are used to specify how many times a given component of a regular expression must appear to satisfy a match. There are 6 types: * or + or ? or {n} or {n,} or {n,m}. The *, +, and ? qualifiers are all greedy in that they will match as many literals as possible. Non-greedy or minimal matching can be achieved by appending a ? after them.
The qualifiers of regular expressions are:
特别字符 | 说明 |
* | 匹配前面的子表达式零次或多次。例如,zo* 能匹配"z" 以及"zoo"。* 等价于{0,} |
+ | 匹配前面的子表达式一次或多次。例如,'zo+' 能匹配"zo" 以及"zoo",但不能匹配"z"。+ 等价于{1,} |
? | 匹配前面的子表达式零次或一次。例如,"do(es)?" 可以匹配"do" 或"does" 中的"do" 。? 等价于{0,1} |
{n} | n 是一个非负整数。匹配确定的n 次。例如,'o{2}' 不能匹配"Bob" 中的'o',但是能匹配"food" 中的两个o |
{n,} | n 是一个非负整数。至少匹配n 次。 |
{n,m} | m 和n 均为非负整数,其中n <= m。最少匹配n 次且最多匹配m 次。 |
3.5 Locator
Used to describe the boundary of a string or a word, ^ and $ refer to the beginning and end of the string respectively, b describes the front or back boundary of a word, and B represents a non-word boundary. Qualifiers cannot be used on locators.
3.6 Select
Use parentheses to enclose all options, and separate adjacent options with |. However, using parentheses will have a side effect, that is, related matches will be cached. In this case, you can use ?: before the first option to eliminate this side effect.
Among them, ?: is one of the non-capturing elements, and the other two non-capturing elements are ?= and ?!. These two have more meanings. The former is a forward lookup and matches the regular expression in parentheses at any beginning. The search string is matched at any position of the regular expression pattern, which is a negative lookahead that matches the search string at any initial position that does not match the regular expression pattern.
3.7 Backreferences
,,,,,,, adding parentheses around a regular expression pattern or part of a pattern, will cause the associated matches to be stored in a temporary buffer, with each submatch captured being encountered from left to right in the regular expression pattern. Content storage. The buffers in which submatches are stored are numbered starting from 1 and numbered consecutively up to a maximum of 99 subexpressions. Each buffer can be accessed using 'n', where n is a one- or two-digit decimal number that identifies a particular buffer.
You can use the non-capturing metacharacters '?:', '?=', or '?!' to ignore the preservation of related matches.
4. Operational precedence of various operators
Operations with the same priority are performed from left to right, and operations with different priorities are performed from high to low. The precedence of various operators from high to low is as follows:
字符 | 描述 |
转义符 | |
(), (?:), (?=), [] | 圆括号和方括号 |
*, +, ?, {n}, {n,}, {n,m} | 限定符 |
^, $, anymetacharacter | 位置和顺序 |
| | “或”操作 |