Regular Expressions Manual

Read(27713) update time(2022-04-13)

Regular expressions, also known as regular expressions. (English: Regular Expression, often abbreviated as regex, regexp or RE in code), a concept in computer science. Regular tables are usually used to retrieve and replace text that matches a certain pattern (rule).


Regular expression is a logical formula for string operations. It uses some predefined specific characters and combinations of these specific characters to form a "rule string". This "rule string" is used to Express a filtering logic for strings.

Many programming languages ​​support string operations using regular expressions. For example, Perl has a powerful regular expression engine built into it. The concept of regular expressions was originally popularized by tool software in Unix (such as sed and grep). Regular expressions are often abbreviated as "regex", the singular includes regexp, regex, and the plural includes regexps, regexes, and regexen.

The first regular expression example!

Instance

<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<title>PHP中文网教程(php.sn)</title>
</head>
<body>

<script>
var str = "abc123def";
var patt1 = /[0-9]+/;
document.write(str.match(patt1));
</script>

</body>
</html>

Run instance »

Click the "Run instance" button to view the online instance

Tips: Our regular expression tutorial will help you learn regular expression knowledge from beginner to advanced. If you have any questions, please go to the PHP Chinese website Regular Expression Community to ask your question, and enthusiastic netizens will answer it for you.

Regular expression features

  • Very flexible, logical and functional;

  • can be quickly Achieve complex control of strings in an extremely simple way.

  • For those who are new to it, it is relatively obscure.

Since the main application object of regular expressions is text, it is used in various text editors, ranging from the famous editor EditPlus to Microsoft Word, Visual Studio, etc. Large editors can use regular expressions to process text content.

Purpose

Given a regular expression and another string, we can achieve the following purpose:

  • Whether the given string Filtering logic that conforms to regular expressions (called "matching")

  • can use regular expressions to get the specific part we want from a string.

Regular engine

Regular engines can be mainly divided into two categories:

  • One is DFA,

  • One is NFA.

Both engines have a long history (more than 20 years now), and there are many variations produced by these two engines! Therefore, the introduction of POSIX avoids the continued generation of unnecessary variants. In this way, mainstream regular engines are divided into three categories: 1. DFA, 2. traditional NFA, and 3. POSIX NFA.

DFA engines execute in a linear-time state because they do not require backtracking (and therefore they never test the same character twice). The DFA engine also ensures that the longest possible string is matched. However, because the DFA engine contains only limited state, it cannot match patterns with backreferences; and because it does not construct explicit expansions, it cannot capture subexpressions.

Traditional NFA engines run a so-called "greedy" match backtracking algorithm that tests all possible expansions of a regular expression in a specified order and accepts the first match. Because traditional NFA constructs a specific extension of a regular expression to obtain a successful match, it can capture both subexpression matches and matching backreferences. However, because traditional NFA backtracks, it can access the exact same state multiple times (if it was reached via different paths). Therefore, in the worst case, it can perform very slowly. Because traditional NFA accepts the first match it finds, it can also cause other (possibly longer) matches to go undiscovered.

POSIX NFA engines are similar to traditional NFA engines, except that they will continue to backtrack until they can ensure that they have found the longest possible match. Therefore, the POSIX NFA engine is slower than the traditional NFA engine; and when using POSIX NFA, you probably won't want to change the order of lookback searches to support shorter match searches instead of longer match searches.

The programs that use the DFA engine mainly include:

awk,egrep,flex,lex,MySQL,Procmail等;

The programs that use the traditional NFA engine mainly include:

GNU Emacs,Java,ergp,less,more,.NET语言,PCRE library,Perl,PHP,Python,Ruby,sed,vi;

The programs that use the POSIX NFA engine mainly include:

mawk,Mortice Kern Systems’ utilities,GNU Emacs(使用时可以明确指定);

There are also engines that use DFA/NFA hybrid:

GNU awk,GNU grep/egrep,Tcl。

Give an example to briefly explain the difference between NFA and DFA work:

For example, there is a string this is yansen's blog, a regular expression The formula is /ya(msen|nsen|nsem)/ (don’t care about the expression, this is just to illustrate the working difference between engines). NFA works as follows. First, search for y in the string and then match whether it is followed by a. If it is a, continue to find whether it is followed by m. If not, match whether it is followed by n (the msen selection branch is eliminated at this time).

Then continue to see if it is followed by s, e, and then test whether it is n. If it is n, the match is successful. If not, test whether it is m. Why m? Because NFA works based on regular expressions and repeatedly tests strings, the same string may be tested many times!

This is not the case with DFA. DFA will search for y in sequence starting from t in this and locate y. If it is known that it is followed by a, then check whether the expression has a, and there happens to be a here. Then the string a is followed by n, and DFA tests the expressions in turn. At this time, msen does not meet the requirements and is eliminated. nsen and nsem meet the requirements, and then DFA checks the strings in sequence. When n in sen is detected, only the nsen branch meets the requirements, and the match is successful!

It can be seen that the two engines work in completely different ways. One (NFA) is expression-oriented, and the other (DFA) is text-oriented! Generally speaking, the DFA engine searches faster! However, NFA is expression-oriented and easier to manipulate, so most programmers prefer NFA engines! Both engines have their own strengths, and the actual citation depends on your needs and the language you are using.

Content covered by this regular expression tutorial manual

This regular expression tutorial covers all basic and advanced knowledge of regular expressions, including regular expression syntax, regular expression metacharacters, regular expressions Expression operator precedence, regular expression matching rules, and more.

Tips: Each chapter of this tutorial contains many regular expression examples. You can directly click the "Run Example" button to view the results online. These examples will help you learn to understand regular expressions better.

Other regular expression related learning reference resources

In addition to the knowledge expansion on the right side of this page, the following resources are also selected for everyone