This time I will bring you a detailed explanation of the use of regular position matching. What are the precautions for using regular position matching? . The following is a practical case, let's take a look.
The example in this article describes the position matching of the regular expression tutorial. Share it with everyone for your reference, as follows:
Note: In all examples, the regular expression matching results contain [ and ]## in the source text. #, some examples will be implemented using Java. If it is the usage of regular expressions in Java itself, it will be explained in the corresponding place. All java examples are tested under JDK1.6.0_13.
1. Introduction to the problem
If we want to match a certain word in a text (not considering the multi-line mode for now, which will be introduced later), we It might look like this: Text:Yesterday is history, tomorrow is a mystery, but today is a gift.
Regular expression:is
Result:Yesterday 【is】 h【is】tory, tomorrow 【is】 a mystery, but today 【is】 a gift.
Analysis: Originally it only wanted to match the word is, but it also matched the is contained in other words. To solve this problem, use boundary delimiters, that is, use somemetacharacters in the regular expression to indicate where (or boundary) we want the matching operation to occur.
2. Word Boundary
A commonly used boundary is the word boundary specified by the qualifier \b, which is used to match the beginning and end of a word. More precisely, it matches a position between a character that can be used to form a word (letter, number, underscore, which is the character matched by \w) and a character that cannot be used to form a word ( characters that match \W). Let’s look at the previous example: Text:Yesterday is history, tomorrow is a mystery, but today is a gift.
Regular expression:\bis \b
Result:Yesterday 【is】 history, tomorrow 【is】 a mystery, but today 【is】 a gift.
Analysis: In the original text, there is a space before and after the word is, which matches the pattern \bis\b (space is one of the characters used to separate words) . The word history also contains is, because there are two characters h and t before and after it. Neither of these two characters can match \b. If a word boundary is not matched, \B is used. For example: Text:Please enter the nine-digit id as it appears on your color - coded pass-key.
Regular expression:\B -\B
Result:Please enter the 【nine-digit】 id as it appears on your color - coded 【pass-key】 .
Analysis: \B-\B will match a hyphen that is not a word boundary before and after. There are no spaces before and after the hyphen in nine-digit and pass-key, so it can match, and color - There are spaces before and after the hyphen in coded, so it cannot be matched.3. StringBoundary
Word boundary can be used to match positions related to words (beginning of word, end of word, entire word, etc. wait). String boundaries have a similar purpose, but are used to match positions related to strings (beginning of string, end of string, entire string, etc.). There are two metacharacters used to define string boundaries: one is ^ used to define the beginning of the string, and the other is $ used to define the end of the string. For example, if you want to check the legality of an XML document, legal XML documents all start with : Text:<?xml version="1.0" encoding="UTF-8"?> <project basedir="." default="ear"> </project>
^\s*<\?xml.*?\?>
结果:
分析:^匹配一个字符串的开头位置,所以^\s*将匹配一个字符串的开头位置和随后的零个或多个空白字符,因为标签前面允许有空格、制表符、换行符等空白字符。
$元字符符的用法除了位置上的差异外,与^用法完全一样。比如,检查一个html页面是否以