node.js - nodejs 正则换行的问题

Question

这是我的正则。 {代码...} str是我要查找的字符串。假如我去掉字符串里面的换行，正则可以匹配到东西，但是如果不加这个代码，正则就匹配不到。 {代码...} 谁能解释一下？如何解决这个问题？ ----------补充-----...

PHP中文网 · Answer

Understand that you want to get all the content in the body tag

Regular expression below

/\([\s\S].*?)\<\/body\>/

The reason why it cannot match correctly is because it was written incorrectly.

Break down the key parts of this expression

([\s\S].*?)

[sS] matches a whitespace or non-whitespace character. In other words, it can match all characters, including newlines, spaces and tabs, but can only match one

.*? What does it mean?

. Indicates matching any character

except newline character

.* means matching 0 or more arbitrary characters (excluding newlines), always matching as many characters as possible.

Here? is used to modify *. Added together *? means lazy matching. What does it mean? Just match as few characters as possible. Which of 0 or more is the least? Of course there are 0, so .*? doesn't match anything.

Entire expression

([\s\S].*?)<\/body>  // 注意 < 和 > 是不需要转义的

matches content that contains only any one character or whitespace between and . and

([\s\S])<\/body>

The matching content of

is the same, which means .*? has no effect.

Why is it OK to just remove .? Because after removing ., lazy matching of *? is used to modify

[\s\S]

part, indicating 0 or more whitespace or non-whitespace characters.

I think you are

[\s\S]

is understood to be used to match newlines. Adding . can match all content. In fact, according to your understanding, it should be written like this

([\s\S.]*?)<\/body>

can also be matched in this way, but the . here is redundant because

[\s\S]

matches any content, including the content matched by ..

So the final answer is

([\s\S]*?)<\/body>

matches 0 or more characters between and . So the content can be matched correctly.

That’s it.

PS: The layout is a bit messy, because escape characters are difficult to use in the SegmentFault editor