The examples in this article describe the positional matching of regular expression tutorials. Share it with everyone for your reference, the details are as follows:
Note: In all examples, the regular expression matching results are included between [and] in the source text. Some examples will be implemented using Java. If The usage of regular expressions in Java itself will be explained in the corresponding places. All java examples are tested under JDK1.6.0_13.
1. Introduction to the problem
If we want to match a certain word in a piece of text (not considering the multi-line mode for now, which will be introduced later), we may look like the following:
Text: Yesterday is history, tomorrow is a mystery, but today is a gift.
Regular expression: is
Result: Yesterday [is] h[is]tory, tomorrow 【is】a mystery, but today 【is】a gift.
Analysis: Originally it only wanted to match the word is, but it also matched the is contained in other words. To solve this problem, use boundary delimiters, which are metacharacters used in regular expressions to indicate where (or boundaries) we want the matching operation to occur.
2. Word Boundary
A commonly used boundary is the word boundary specified by the qualifier \b, which is used to match the beginning and end of a word. More precisely, it matches a position between a character that can be used to form a word (letter, number, underscore, which is the character matched by \w) and a character that cannot be used to form a word ( characters that match \W). Let’s look at the previous example:
Text: Yesterday is history, tomorrow is a mystery, but today is a gift.
Regular expression: \bis\b
Result: Yesterday [is] history, tomorrow [is] a mystery, but today [is] a gift.
Analysis: In the original text, there is a space before and after the word is, and this is consistent with the pattern \bis\ b matches (space is one of the characters used to separate words). The word history also contains is, because there are two characters h and t before and after it. Neither of these two characters can match \b.
If a word boundary is not matched, \B is used. For example:
Text: Please enter the nine-digit id as it appears on your color - coded pass-key.
Regular expression:\B-\B
Result : Please enter the [nine-digit] id as it appears on your color - coded [pass-key].
Analysis: \B-\B will match a hyphen that is not a word boundary before and after nine, nine There are no spaces before and after the hyphen in -digit and pass-key, so they can match. However, there are spaces before and after the hyphen in color-coded, so they cannot match.
3. String boundaries
Word boundaries can be used to match positions related to words (beginning of word, end of word, entire word, etc.). String boundaries have a similar purpose, but are used to match positions related to strings (beginning of string, end of string, entire string, etc.). There are two metacharacters used to define string boundaries: one is ^ used to define the beginning of the string, and the other is $ used to define the end of the string.
For example, if you want to check the legality of an XML document, legal XML documents all start with :
Text:
<?xml version="1.0" encoding="UTF-8"?> <project basedir="." default="ear"> </project>
Regular expression: ^\s*<\?xml.*?\?>
Result:
xml version="1.0" encoding="UTF-8"?>
Analysis: ^ matches the beginning of a string, so ^\s* will match the beginning of a string and subsequent zero or more whitespace characters, because spaces, tabs, and newlines are allowed before the tag and other whitespace characters. The usage of the
$ metacharacter is exactly the same as the usage of ^ except for the difference in position. For example, to check whether an html page ends with