Home Web Front-end JS Tutorial Detailed explanation of repeated matching in regular expression tutorial

Detailed explanation of repeated matching in regular expression tutorial

Jan 09, 2017 pm 04:01 PM

The examples in this article describe the repeated matching of regular expression tutorials. Share it with everyone for your reference, the details are as follows:

Note: In all examples, the regular expression matching results are included between [and] in the source text. Some examples will be implemented using Java. If The usage of regular expressions in Java itself will be explained in the corresponding places. All java examples are tested under JDK1.6.0_13.

1. How many matches are there

The previous articles talked about matching one character, but if a character or a set of characters needs to be matched multiple times, what should be done? For example, if you want to match an email address, using the method mentioned before, someone may write a regular expression like \w@\w\.\w, but this can only match addresses like a@b.c. This is obviously incorrect, so let’s look at how to match email addresses.

First of all, you need to know the composition of an email address: a group of characters starting with alphanumeric or underscore, followed by the @ symbol, and then the domain name, that is, username@domain name address. However, this also depends on the specific email service provider. Some also allow . characters in user names.

1. Match one or more characters

To match multiple repetitions of the same character (or set of characters), simply add a The + character as a suffix is ​​fine. +matches one or more characters (at least one). For example: a matches a itself, a+ will match one or more consecutive a's; [0-9]+ matches multiple consecutive numbers.

Note: When adding a + suffix to a character set, the + must be placed outside the character set, otherwise it will not be a repeated match. For example, [0-9+] represents a number or a + sign. Although it is grammatically correct, it is not what we want.

Text: Hello, mhmyqn@qq.com or mhmyqn@126.com is my email.

Regular expression: \w+@(\w+\.)+\w+

Result: Hello, [mhmyqn@qq.com] or [mhmyqn@126.com] is my email.

Analysis: \w+ can match one or more characters, while the subexpression (\ w+\.)+ can match a string like xxxx.edu., but it will not end with a . character, so there will be a \w+ at the end. Email addresses like mhmyqn@xxxx.edu.cn will also be matched.

2. Match zero or more characters

Use the metacharacter * to match zero or more characters. Its usage is exactly the same as +, just put it next to the character or character After the set, you can match zero or more consecutive occurrences of the character (or set of characters). For example, the regular expression ab*c can match ac, abc, abbbbc, etc.

3. Match zero or one character

Use the metacharacter ? to match zero or one character. As mentioned in the previous article, the regular expression \r\n\r\n is used to match a blank line, but in Unix and Linux, \r is not needed. You can use the metacharacters ?, \r?\n\r? \nThis can match blank lines in Windows as well as Unix and Linux. Let's look at an example of a URL matching the http or https protocol:

Text: The URL is http://www.mikan.com, to connect securely use https://www.mikan.cominstead.

Regular expression: https?://(\w+\.)+\w+

Result: The URL is [http://www.mikan.com], to connect securely use [https://www.mikan.com] instead.

Analysis: This pattern starts with https?, which means that the character before ? may or may not exist, so it can match http or https, followed by Parts are the same as the previous example.

2. Number of matching repetitions

+, * and ? in regular expressions solve many problems, but:

1) Number of characters matched by + and * There is no upper limit to the number. There is no way to set a maximum number of characters that they will match.

2) +, * and ? match at least one or zero characters. We cannot set another minimum number of characters for which they will match.

3) If we only use * and +, we cannot set the number of characters they match to an exact number.

Regular expressions provide a syntax for setting the number of repetitions. The number of repetitions should be given using { and } characters, and the value should be written between them.

1. Set an exact value for the number of repeated matches

If you want to set an exact value for the number of repeated matches, just write the number between { and }. For example, {4} means that the character (or set of characters) before it must be repeated 4 times in the original text to be considered a match. If it only appears 3 times, it is not considered a match.

As mentioned in the previous articles for examples of matching colors on the page, you can use the number of repetitions to match: #[[:xdigit:]]{6} or #[0-9a-fA-F ]{6}, POSIX characters are #\\p{XDigit}{6} in java.

2. Set an interval for the number of repeated matches

{} syntax can also be used to set an interval for the number of repeated matches, that is, set a minimum value and the number of repeated matches. maximum value. Such intervals must be given in the form {n, m}, where n>=m>=0. For example, a regular expression to check whether the date format is correct (without checking the validity of the date) (such as the date 2012-08-12 or 2012-8-12): \d{4}-\d{1,2}-\d {1,2}.

3. At least how many times must the match be repeated

The last usage of the

{} syntax is to give a minimum number of repetitions (but not necessarily a maximum number of repetitions), such as {3,} indicating at least 3 repetitions. Note: There must be a comma in {3,}, and there cannot be a space after the comma. Otherwise something will go wrong.

Let’s look at an example, use regular expressions to find all amounts greater than $100:

Text:

$25.36

$125.36

$205.0

$2500.44

$44.30

Regular expression: $\d{3,}\.\d{2}

Result:

$25.36

【$125.36】

【$205.0】

【$2500.44】

$44.30

+,* ,? can be expressed as the number of repetitions:

+ is equivalent to {1,}

* is equivalent to {0,}

? is equivalent to {0,1 }

3. Prevent over-matching

? can only match zero or one character. {n} and {n,m} also have an upper limit on the number of matching repetitions, but like *, +, There is no upper limit for {n,}, which sometimes leads to over-matching.

Let’s look at an example of matching an html tag

Text:

Yesterday is history,tomorrow is a mystery, but today is a gift.

Regular expression: <[Bb]>.*

Result:

Yesterday is 【history,tomorrow is a mystery, but today is a gift】.

Analysis: <[Bb]> matches the tag (not case-sensitive), matches the tag (not case-sensitive). But the result is not as expected. There are three. Everything after the first tag and up to the last are matched.

Why is this so? Because * and + are both greedy metacharacters, their behavior pattern when matching is the more the better. They will try their best to match from the beginning of a text to the end of the text, rather than from the beginning of the text to until the first match is encountered.

Lazy versions of these metacharacters can be used when this greedy behavior is not required. Lazy means matching as few characters as possible, as opposed to greedy. Lazy metacharacters only need to add a ? suffix to greedy metacharacters. Here is the lazy version of the greedy metacharacter:

* *?

+ +?

{n,} {n,}?

So in the above example, the regular expression only needs to be changed to <[Bb]>.*?. The result is as follows:

history< /b>

mystery

gift

4. Summary

Regular Expression The true power of the formula is reflected in the matching number of repetitions. Here we introduce the usage of metacharacters +, *, and ?. If you want to accurately determine the number of matches, use {}. There are two types of metacharacters: greedy and lazy. When you need to prevent excessive matching, please use lazy metacharacters to construct regular expressions. Position matching will be introduced in the next article.

I hope this article will be helpful for everyone to learn regular expressions.

For more detailed explanations of repeated matching in regular expression tutorials, please pay attention to the PHP Chinese website!


Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

What should I do if I encounter garbled code printing for front-end thermal paper receipts? What should I do if I encounter garbled code printing for front-end thermal paper receipts? Apr 04, 2025 pm 02:42 PM

Frequently Asked Questions and Solutions for Front-end Thermal Paper Ticket Printing In Front-end Development, Ticket Printing is a common requirement. However, many developers are implementing...

Who gets paid more Python or JavaScript? Who gets paid more Python or JavaScript? Apr 04, 2025 am 12:09 AM

There is no absolute salary for Python and JavaScript developers, depending on skills and industry needs. 1. Python may be paid more in data science and machine learning. 2. JavaScript has great demand in front-end and full-stack development, and its salary is also considerable. 3. Influencing factors include experience, geographical location, company size and specific skills.

How to merge array elements with the same ID into one object using JavaScript? How to merge array elements with the same ID into one object using JavaScript? Apr 04, 2025 pm 05:09 PM

How to merge array elements with the same ID into one object in JavaScript? When processing data, we often encounter the need to have the same ID...

Demystifying JavaScript: What It Does and Why It Matters Demystifying JavaScript: What It Does and Why It Matters Apr 09, 2025 am 12:07 AM

JavaScript is the cornerstone of modern web development, and its main functions include event-driven programming, dynamic content generation and asynchronous programming. 1) Event-driven programming allows web pages to change dynamically according to user operations. 2) Dynamic content generation allows page content to be adjusted according to conditions. 3) Asynchronous programming ensures that the user interface is not blocked. JavaScript is widely used in web interaction, single-page application and server-side development, greatly improving the flexibility of user experience and cross-platform development.

The difference in console.log output result: Why are the two calls different? The difference in console.log output result: Why are the two calls different? Apr 04, 2025 pm 05:12 PM

In-depth discussion of the root causes of the difference in console.log output. This article will analyze the differences in the output results of console.log function in a piece of code and explain the reasons behind it. �...

How to achieve parallax scrolling and element animation effects, like Shiseido's official website?
or:
How can we achieve the animation effect accompanied by page scrolling like Shiseido's official website? How to achieve parallax scrolling and element animation effects, like Shiseido's official website? or: How can we achieve the animation effect accompanied by page scrolling like Shiseido's official website? Apr 04, 2025 pm 05:36 PM

Discussion on the realization of parallax scrolling and element animation effects in this article will explore how to achieve similar to Shiseido official website (https://www.shiseido.co.jp/sb/wonderland/)...

Can PowerPoint run JavaScript? Can PowerPoint run JavaScript? Apr 01, 2025 pm 05:17 PM

JavaScript can be run in PowerPoint, and can be implemented by calling external JavaScript files or embedding HTML files through VBA. 1. To use VBA to call JavaScript files, you need to enable macros and have VBA programming knowledge. 2. Embed HTML files containing JavaScript, which are simple and easy to use but are subject to security restrictions. Advantages include extended functions and flexibility, while disadvantages involve security, compatibility and complexity. In practice, attention should be paid to security, compatibility, performance and user experience.

Is JavaScript hard to learn? Is JavaScript hard to learn? Apr 03, 2025 am 12:20 AM

Learning JavaScript is not difficult, but it is challenging. 1) Understand basic concepts such as variables, data types, functions, etc. 2) Master asynchronous programming and implement it through event loops. 3) Use DOM operations and Promise to handle asynchronous requests. 4) Avoid common mistakes and use debugging techniques. 5) Optimize performance and follow best practices.

See all articles