node.js - nodejs 正则换行的问题
伊谢尔伦
伊谢尔伦 2017-04-17 15:34:52
0
1
762

这是我的正则。

\<body\>([\s\S].*?)\<\/body\>

str是我要查找的字符串。假如我去掉字符串里面的换行,正则可以匹配到东西,但是如果不加这个代码,正则就匹配不到。

str = str.replace(/\n/g, "");

谁能解释一下?如何解决这个问题?

----------补充-----------

后来换成

\<body\>([\s\S]*?)\<\/body\>

这样就行了。
.*?和*?的区别在哪呢?

伊谢尔伦
伊谢尔伦

小伙看你根骨奇佳,潜力无限,来学PHP伐。

reply all(1)
洪涛

Understand that you want to get all the content in the body tag

Regular expression below

/\<body\>([\s\S].*?)\<\/body\>/

The reason why it cannot match correctly is because it was written incorrectly.

Break down the key parts of this expression

([\s\S].*?)

[sS] matches a whitespace or non-whitespace character. In other words, it can match all characters, including newlines, spaces and tabs, but can only match one

.*? What does it mean?

. Indicates matching any character

except newline character

.* means matching 0 or more arbitrary characters (excluding newlines), always matching as many characters as possible.

Here? is used to modify *. Added together *? means lazy matching. What does it mean? Just match as few characters as possible. Which of 0 or more is the least? Of course there are 0, so .*? doesn't match anything.

Entire expression

<body>([\s\S].*?)<\/body>  // 注意 < 和 > 是不需要转义的

matches content that contains only any one character or whitespace between <body> and </body>. and

<body>([\s\S])<\/body>
The matching content of

is the same, which means .*? has no effect.

Why is it OK to just remove .? Because after removing ., lazy matching of *? is used to modify

[\s\S]

part, indicating 0 or more whitespace or non-whitespace characters.

I think you are

[\s\S] 

is understood to be used to match newlines. Adding . can match all content. In fact, according to your understanding, it should be written like this

<body>([\s\S.]*?)<\/body>

can also be matched in this way, but the . here is redundant because

[\s\S] 

matches any content, including the content matched by ..

So the final answer is

<body>([\s\S]*?)<\/body>

matches 0 or more characters between <body> and </body>. So the content can be matched correctly.

That’s it.

PS: The layout is a bit messy, because escape characters are difficult to use in the SegmentFault editor

Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template