Home > Java > javaTutorial > body text

Java regular expression learning

高洛峰
Release: 2016-11-16 11:16:33
Original
1219 people have browsed it

Matching mode

JDK provides three matching modes: greedy, reluctant and possessive, which respectively correspond to three possessive quantifiers. Greedy mode is the default mode and reluctant mode. Indicated by adding a ? after the expression. Possession mode is indicated by appending a + to the end of the expression.

What are the meanings of the three modes?

The meaning of the greedy mode is: match as many matches as possible while trying to satisfy the overall match.

The meaning of reluctant mode is: matching as little as possible while also trying to satisfy the overall match.

The meaning of possession mode is: match as many as possible. If the arrangement cannot match due to too many matches, there will be no backtracking.

For example, there is a string as follows:

/m/t/wd/nl/n/p/m/wd/nl/n/p/m/wd/nl/n/p/m/v/n
Copy after login

Expression matching in greedy mode:

/m/t.*/nl/n/p/m

此时匹配结果为 /m/t/wd/nl/n/p/m/wd/nl/n/p/m/wd/nl/n/p/m
Copy after login

Expression matching in reluctant mode:

/m/ t/.*?/nl/n/p/m

此时匹配结果为 /m/t/wd/nl/n/p/m
Copy after login

/m/t/wdx+?/nl/n/p/m

If this is the case, then it will not match, because + means at least matching One, reluctant mode, must match at least one, so the match fails.

Expression matching of occupancy pattern:

/m/t.*+/nl/n/p/m It cannot be matched at this time because .* matches too many characters, which makes it impossible to match later.

Note: Only forced quantifiers or possessive quantifiers can be used for variable matching rules. For example, X?? means matching the character X as little as possible, while X? is the default greedy mode, which means matching as much as possible. Another example: X{n} means that you must prepare to match n nature.

Looking is suitable for such scenarios: when doing regular matching, you need to know whether there are specific expressions before or after the matched part, without capturing (consuming) these specific expressions.

If you do not use lookaround, but directly use expressions to judge, then these matched expressions will inevitably be consumed.

For example: Suppose I want to segment the sentence ILoveYou. The principle is that if a capital letter appears, it is considered a new word.

If you use this matching rule:

\p{Upper}\p{Lower}*[\p{Upper}]?
Copy after login

, then the matched uppercase letters will be consumed. The matching result would be:

IL

You

This does not meet the requirements.

The solution is to use lookaround. The regular expression is:

\p{Upper}?\p{Lower}*(?=[\p{Upper}]?)
Copy after login

The output result is:

I

Love

You

There are four types of lookaround:

(?=X) means that what follows is a regular expression Formula X, when matching the previous part, the part X will not be consumed and will not be captured. Zero-width forward positive prediction.

(?<=X) means that the previous part is the regular expression X. When matching the following part, the X part will not be consumed and will not be captured. Zero-width reverse positive prediction.

(?!X) means that what follows is not the regular expression X. When matching the previous part, the X part will not be consumed and will not be captured. Zero-width forward negative prediction.

(?!=X) means that the preceding part is not the regular expression X. When matching the following part, the X part will not be consumed and will not be captured. Zero-width backward negative prediction.

Non-capturing possessive matching

(?>X) This has not been studied clearly.

Related labels:
source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template