Home > Web Front-end > JS Tutorial > JavaScript regex also has a single-line mode

JavaScript regex also has a single-line mode

小云云
Release: 2017-12-09 11:27:57
Original
1136 people have browsed it

This article mainly introduces the regular JavaScript and single-line mode. Friends who need it can refer to it. I hope it can help everyone.

Regular expressions were first implemented by Ken Thompson in his improved QED editor in 1970. The simplest metacharacter "." in regular expressions at that time matched everything except newlines. Any character:

"." is a regular expression which matches any character except .

The above sentence comes from the official document of QED in 1970, which may be the first A regular document.

Why is this stipulated? This is because QED edits files in line units, and the newline character at the end of the line is also included in the content of this line. For example, if you want to delete all single-line comments in a piece of code, you can use the following command in QED:

1,$s#//.*##
Copy after login

If "." can be matched Newline character, then the newline character will also be deleted, which will cause these lines to be merged with its next line. This is usually not the result we want. Therefore, "." was designed not to match newline characters when it was first invented. Although there is no QED command on the current operating system for us to test, we still have VIM, and the "." in VIM cannot match the newline character for the same reason.

Unlike in Node, where reading a file usually means reading the entire file in one go, Perl inherits the tradition of reading files line by line with many Linux commands, like this:

while (<>) {print $_}
Copy after login
There is also a newline character at the end of

_, so Perl naturally inherits QED's rule that "." does not match newline characters. But Perl is a programming language after all, not an editor. The objects that its regular expressions need to match are not only single lines of text, but may also be multi-line texts. Therefore, in its regular expressions, "." has a requirement for cross-line matching. Therefore, Perl invented the regular single-line mode /s, which allows "." to also match newlines.

The official description of the /s modifier in Perl used to turn on single line mode is "Treat the string as single line". This "single line" should be understood like this: "." can only match in normal mode. Inline characters cannot span lines; in single-line mode, Perl will pretend to treat multi-line strings as one line, and treat the newline characters as inline characters, so "." can match them. To put it more vividly, the following three lines of text

1
2
3
Copy after login

are regarded as "1\n2\n3\n" one line of text. This is what single-line mode means. .

But the terrible thing is that for the same reason (string variables can contain multiple lines of text), Perl also invented the /m modifier, which is multi-line mode. The official description is "Treat the string as multiple lines ", this pattern has been included in the regular JavaScript rules since ancient times. The "multiple lines" here means: ^ and $ metacharacters will not match the positions before and after the newline characters in the middle of a string by default, that is, the string is always considered to be only one line. , you can match after turning on multi-line mode.

In other words, single-line mode and multi-line mode are for different metacharacters. People who are new to regular expressions will be confused by the two seemingly corresponding "single-line mode" and "multi-line mode". concept, but in fact, it is confusing with unrelated terms.

Later, the author of Ruby may have felt that the regular term "single-line mode" was not used well, so he called the pattern of "." matching newlines "multi-line mode", that is, let . * and other regular expressions can match multiple lines, so it makes perfect sense. The modifier also uses /m (Ruby will enable the "multiline mode" in Perl by default, so /m is not occupied). This is really To add insult to injury, it’s even more chaotic.

Later, the Python author may also feel that the term "single-line mode" should be avoided, so he gave a new name "dotall", which means that dot can match all characters. It is a good name. , and later Java also used this name.

The above has reviewed the history, explained the origin of the single-line mode, and explained that the name of the single-line mode was not chosen well. V8 has recently implemented a stage 3 ES proposal https://github.com/mathiasbynens/es-regexp-dotall-flag. This proposal introduces the /s modifier and dotAll attribute to JavaScript regularity. The dotAll attribute is learned In Python and Java, the /s modifier is inherited from Perl. There is no need to invent a new modifier such as /d here, which will only make things more complicated. The specific effect of /s in JavaScript is to allow "." to match four line terminators that could not be matched before: \n (line feed), \r (carriage return), \u2028 (line separator), \u2029 (paragraph separator) symbol):

/foo/s.dotAll // true
/^.{4}$/s.test("\n\r\u2028\u2029") // true
Copy after login

is actually a very simple thing, but some students who have not been exposed to regular expressions other than JavaScript may learn this new pattern by then. There will be confusion later, let me clarify again: multi-line mode controls the performance of ^ and $, and single-line mode controls the performance of ".". There is no direct relationship between the two.

However, the Perl language, which originally introduced the confusing concepts of single-line mode and multi-line mode, has completely deleted these two modes in Perl 6: "." matches newline characters by default, and \N can match newline characters. Any character except the character; ^ and $ always match the beginning and end of the string, and two new metacharacters, ^^ and $$, are introduced to match the beginning and end of the line.

The replacements for single-line mode [^] or [\s\S] that we used in the past are not completely useless. For example, in some editors that use JavaScript regularity (VS Code, Atom), no It is very possible to provide you with an interface to enable single-line mode. However, talking about the regular function in the editor, the regular function of the editor implemented in JavaScript is still too weak. For example, certain modes cannot be turned on within the regular code itself. For example, if it is in Sublime (using Python regular code), inside the regular code Use (?s) to enable dotall mode. For example, you can use (?s)/\*.+?\*/ to match all multi-line comments. ,

Related recommendations:

JavaScript regular method replace to implement search keyword highlighting_regular expression

Regular expression in JavaScript Concept and Application of Regular Expressions_Regular Expressions

Application of JavaScript Regular Expressions

The above is the detailed content of JavaScript regex also has a single-line mode. For more information, please follow other related articles on the PHP Chinese website!

Related labels:
source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template