This time I will bring you a detailed explanation of the use of regular repeated matching. What are the precautions when using regular repeated matching. The following is a practical case, let's take a look.
The example in this article describes the repeated matching of regular expressiontutorial. Share it with everyone for your reference, as follows:
Note: In all examples, the regular expression matching results contain [ and ]## in the source text. #, some examples will be implemented using Java. If it is the usage of regular expressions in Java itself, it will be explained in the corresponding place. All java examples are tested under JDK1.6.0_13.
1. How many matches are there?
The previous articles talked about matching one character, but if a character or a set of characters needs to be matched multiple times, it should How to do it? For example, if you want to match an email address, using the method mentioned before, someone may write a regular expression like \w@\w\.\w, but this can only match addresses like a@b.c. This is obviously incorrect, so let’s look at how to match email addresses. First of all, you need to know the composition of an email address: a group of characters starting with alphanumeric or underscore, followed by the @ symbol, and then the domain name, that is, username@domain name address. However, this also depends on the specific email service provider. Some also allow . characters in user names.1. Match one or more characters
If you want to match multiple repetitions of the same character (or set of characters), simply give the character (or Character set) plus a + character as a suffix. +matches one or more characters (at least one). For example: a matches a itself, a+ will match one or more consecutive a's; [0-9]+ matches multiple consecutive numbers. Note: When adding a + suffix to a character set, the + must be placed outside the character set, otherwise it will not be a repeated match. For example, [0-9+] represents a number or a + sign. Although it is grammatically correct, it is not what we want. Text:Hello, mhmyqn@qq.com or mhmyqn@126.com is my email.
Regular expression:\w+@(\w+ \.)+\w+
Result:Hello, 【mhmyqn@qq.com】 or 【mhmyqn@126.com】 is my email.
Analysis: \w+ can match one or more characters, and the subexpression (\w+\.)+ can match a string like xxxx.edu., and the last is not It will end with a . character, so there will be a \w+ after it. Email addresses like mhmyqn@xxxx.edu.cn will also be matched.2. Match zero or more characters
Use the metacharacter * to match zero or more characters. Its usage is exactly the same as +. Just replace it Put it after a character or character set to match zero or more consecutive occurrences of the character (or character set). For example, the regular expression ab*c can match ac, abc, abbbbc, etc.3. Match zero or one character
Use the metacharacter ? to match zero or one character. As mentioned in the previous article, the regular expression \r\n\r\n is used to match a blank line, but in Unix and Linux, there is no need for \r. You can use the metacharacters ?, \r?\n\r? \nThis can match blank lines in Windows as well as Unix and Linux. Let's look at an example of a URL matching the http or https protocol: Text:The URL is http://www.mikan.com, to connect securely use https://www.mikan.cominstead.
Regular expression:https?://(\w+\.)+\w+
Result:The URL is 【http://www.mikan.com】, to connect securely use 【https://www.mikan.com】 instead.
Analysis: This pattern starts with https?, which means that the character before ? may or may not exist, so it can match http or https. The following part is the same as the previous example.2. Number of matching repetitions
+, * and ? in regular expressions solve many problems, but:1) There is no upper limit on the number of characters matched by + and *. We cannot set a maximum number of characters that they will match.
2) +, * and ? match at least one or zero characters. We cannot set another minimum number of characters for which they will match.
3) If we only use * and +, we cannot set the number of characters they match to an exact number.
Regular expressions provide a syntax for setting the number of repetitions. The number of repetitions should be given using { and } characters, and the value should be written between them.
1. Set an exact value for the number of repeated matches
If you want to set an exact value for the number of repeated matches, write that number between { and } between. For example, {4} means that the character (or set of characters) before it must be repeated 4 times in the original text to be considered a match. If it only appears 3 times, it is not considered a match.
As mentioned in the previous articles for examples of matching colors on the page, you can use the number of repetitions to match: #[[:xdigit:]]{6} or #[0-9a-fA-F ]{6}, POSIX characters are #\\p{XDigit}{6} in java.
2. Set an interval for the number of repeated matches
{} syntax can also be used to set an interval for the number of repeated matches, that is, the number of repeated matches Set a minimum and maximum value. Such intervals must be given in the form {n, m}, where n>=m>=0. For example, a regular expression to check whether the date format is correct (without checking the validity of the date) (such as the date 2012-08-12 or 2012-8-12): \d{4}-\d{1,2}-\d {1,2}.
3. Match at least how many times the
{} syntax is used to give a minimum number of repetitions (but it is not necessary to give a maximum number of repetitions) , such as {3,} means repeating at least 3 times. Note: There must be a comma in {3,}, and there cannot be a space after the comma. Otherwise something will go wrong.
Let’s look at an example, use regular expressions to find all amounts greater than $100:
Text:
$25.36
$125.36
$205.0
$2500.44
$44.30
Regular expression: $\d{3,}\.\d{2}
Result:
$25.36
【$125.36】
【$205.0】
【$2500.44】
$44.30
+, *, ? can be expressed as the number of repetitions:
+ is equivalent to {1,}
* is equivalent to {0 ,}
? Equivalent to {0,1}
3. Prevent excessive matching
? can only match zero or one character, {n} and {n,m} also have an upper limit for the number of matching repetitions, but there is no upper limit for *, +, {n,}, which sometimes leads to over-matching.
Let’s look at an example of matching a html tag
Text:
##Yesterday is history,tomorrow is a mystery, but today is a gift.
Regular expression:<[Bb]>. *[Bb]>
Result:Yesterday is 【history,tomorrow is a < B>mystery, but today is a gift】.
Analysis: <[Bb]>match tag (not case sensitive), [Bb]> matches the tag (not case sensitive). But the result is not as expected. There are three. Everything after the first tag and up to the last are matched.Why is this so? Because * and + are both greedy metacharacters, their behavior pattern when matching is the more the better. They will try their best to match from the beginning of a text to the end of the text, rather than from the beginning of the text to until the first match is encountered.
Lazy versions of these metacharacters can be used when such greedy behavior is not required. Lazy means matching as few characters as possible, as opposed to greedy. Lazy metacharacters only need to add a ? suffix to greedy metacharacters. Here is the lazy version of the greedy metacharacter:
* *?
+ +?
{n,} {n,}?
So in the above example, the regular expression only needs to be changed to <[Bb]>.*?[Bb]> That’s it, the result is as follows:
history
##mystery
gift
##4. SummaryThe real power of regular expressions is reflected in the number of repetitions Matching aspect. Here we introduce the usage of metacharacters +, *, and ?. If you want to accurately determine the number of matches, use {}. There are two types of metacharacters: greedy and lazy. When you need to prevent excessive matching, please use lazy metacharacters to construct regular expressions. Position matching will be introduced in the next article.
I believe you have mastered the method after reading the case in this article. For more exciting information, please pay attention to other related articles on the php Chinese website!
Recommended reading:
Position matching tutorial of regular expression tutorial (with code) Implemented with php and js Regularly matching passwords that combine numbers and lettersThe above is the detailed content of Detailed explanation of regular repeated matching. For more information, please follow other related articles on the PHP Chinese website!