Demystifying Regex with Practical Examples-PHP Tutorial-php.cn

Regular expressions (Regex) are a valuable tool for developers, used for tasks such as log analysis, form submission validation, and find and replace operations. Understanding how to effectively build and use Regex can greatly enhance productivity and efficiency.
Building a good Regex involves defining a scenario, developing a plan, and implementing/testing/refactoring. It’s important to understand the types of characters allowed, how many times a character must appear, and any constraints to follow.
Practical examples of Regex usage include matching a password, a URL, a specific HTML tag, and duplicated words. These examples demonstrate the use of character ranges, assertions, conditions, groups, and more.
While Regex is a powerful tool, it can also be complex and difficult to manage. Therefore, it’s sometimes more effective to use several smaller Regex instead of one large one. Paying attention to group captures can also make matches more useful for further processing.

Regular expressions are often used to perform searches, replace substrings and validate string data. This article provides tips, tricks, resources and steps for going through intricate regular expressions. If you don’t have the basic skillset under your belt, you can learn regex with our beginner’s guide. As arcane as regular expressions look, it won’t take you long to learn the concepts. There are many books, articles, websites that explain regular expressions, so instead of writing another explanation I’d prefer to go straight to more practical examples:

Matching a password
Matching a URL
Matching a specific HTML tag
Matching duplicated words

You can find a useful cheat sheet at this link. Along with a host of useful resources, there is also a conference video by Lea Verou at the bottom of this post – it’s a bit long, but it’s excellent in breaking down RegEx.

How to build a good regex

Regular expressions are often used in the developer’s daily routine – log analysis, form submission validation, find and replace, and so on. That’s why every good developer should know how to use them, but what is the best practice to build a good regex?

1. Define a scenario

Using natural language to define the problem will give you a better idea of the approach to use. The words could and must, used in a definition, are useful to describe mandatory constraints or assertions. Below is an example:

The string must start with ‘h’ and finish with ‘o’ (e.g. hello, halo).
The string could be wrapped in parentheses.

2. Develop a plan

After having a good definition of the problem, we can understand the kind of elements that are involved in our regular expression:

What are the types of characters allowed (word, digit, new line, range, …)?
How many times must a character appear (one or more, once, …)?
Are there some constraints to follow (optionals, lookahead/behind, if-then-else, …)?

3. Implement/Test/Refactor

It’s very important to have a real-time test environment to test and improve your regular expression. There are websites like regex101.com, regexr.com and debuggex.com that provide some of the best environments. To improve the efficiency of the regex, you could try to answer some of these additional questions:

Are the character classes correctly defined for the specific domain?
Should I write more test strings to cover more use cases?
Is it possible to find and isolate some problems and test them separately?
Should I refactor my expression with subpatterns, groups, conditions, etc., to make it smaller, clearer and more flexible?

Practical examples

The goal of the following examples is not to write an expression that will only solve the problem, but to write the most effective expression for the specific use cases, using important elements like character ranges, assertions, conditions, groups and so on.

Matching a password

Demystifying Regex with Practical Examples

Scenario:

6 to 12 characters in length
Must have at least one uppercase letter
Must have at least one lower case letter
Must have at least one digit
Should contain other characters

Pattern: ^(?=.*[a-z])(?=.*[A-Z])(?=.*d).{6,12}$ This expression is based on multiple positive lookahead (?=(regex)). The lookahead matches something followed by the declared (regex). The order of the conditions doesn’t affect the result. Lookaround expressions are very useful when there are several conditions. We could also use the negative lookahead (?!(regex)) to exclude some character ranges. For example, I could exclude the % with (?!.*#). Let’s explain each pattern of the above expression:

^ asserts position at start of the string
(?=.*[a-z]) positive lookahead, asserts that the regex .*[a-z] can be matched:
- .* matches any character (except newline) between zero and unlimited times
- [a-z] matches a single character in the range between a and z (case sensitive)
(?=.*[A-Z]) positive lookahead, asserts that the regex .*[A-Z] can be matched:
- .* matches any character (except newline) between zero and unlimited times
- [A-Z] matches a single character between A and Z (case sensitive)
(?=.*d) positive lookahead, asserts that the regex *dcan be matched:
- .* matches any character (except newline) between zero and unlimited times
- d matches a digit [0-9]
.{6,12}matches any character (except newline) between 6 and 12 times
$ asserts position at end of the string

Matching URL

Scenario:

Must start with http or https or ftp followed by ://
Must match a valid domain name
Could contain a port specification (http://www.sitepoint.com:80)
Could contain digit, letter, dots, hyphens, forward slashes, multiple times

Pattern: ^(http|https|ftp):[/]{2}([a-zA-Z0-9-.] .[a-zA-Z]{2,4})(:[0-9] )?/?([a-zA-Z0-9-._?,'/\ &%$#=~]*) The first scenario is pretty easy to solve with ^(http|https|ftp):[/]{2}. To match the domain name we need to bear in mind that to be valid it can only contain letters, digits, hyphen and dots. In my example, I limited the number of characters after the punctuation from 2 to 4, but could be extended for new domains like .rocks or .codes. The domain name is matched by ([a-zA-Z0-9-.] .[a-zA-Z]{2,4}). The optional port specification is matched by the simple (:[0-9] )?. A URL can contain multiple slashes and multiple characters repeated many times (see RFC3986), this is matched by using a range of characters in a group ([a-zA-Z0-9-._?,'/\ &%$#=~]*). It’s really useful to match every important element with a group capture (), because it will return only the matches we need. Remember that certain characters need to be escaped with . Below, every single subpattern explained:

^ asserts position at start of the string
capturing group (http|https|ftp), captures http or https or ftp
: escaped character, matches the character : literally
[/]{2} matches exactly 2 times the escaped character /
capturing group ([a-zA-Z0-9-.] .[a-zA-Z]{2,4}):
- [a-zA-Z0-9-.] matches one and unlimited times character in the range between a and z, A and Z, 0 and 9, the character - literally and the character . literally
- . matches the character . literally
- [a-zA-Z]{2,4}matches a single character between 2 and 4 times between a and z or A and Z (case sensitive)
capturing group (:[0-9] )?:
- quantifier ? matches the group between zero or more times
- : matches the character : literally
- [0-9] matches a single character between 0 and 9 one or more times
/? matches the character / literally zero or one time
capturing group ([a-zA-Z0-9-._?,'/\ &%$#=~]*):
- [a-zA-Z0-9-._?,'/\ &%$#=~]* matches between zero and unlimited times a single character in the range a-z, A-Z, 0-9, the characters: -._?,'/ &%$#=~.

Matching HTML TAG

Scenario:

The start tag must begin with
The end tag must start with followed by one or more characters and end with >
We must match the content inside a TAG element

Pattern: (.*?)1> Matching the start tag and the content inside it’s pretty easy with and (.*?), but in the pattern above I have added a useful thing: the reference to a capturing group. Every capturing group defined by parentheses () could be referred to using its position number, (first)(second)(third), which will allow for further operations. The expression above could be explained as:

Start with
Capture the tag name
Followed by one or more chars
Capture the content inside the tag
The closing tag must be name captured before>

Including only two capture groups in the expression, the tag name and the content, will return a very clear match, a list of tag names with related content. Let’s dig a little deeper and explain the subpatterns:

capturing group ([w] ) matches any word character a-zA-Z0-9_ one or more times
.* matches any character (except newline) between zero or more times
> matches the character > literally
capturing group (.*?), matches any character (except newline), zero and more times
/ matches the character / literally
1 matches the same text matched by the first capturing group: ([w] )
> matches the characters > literally

Matching duplicated words

Scenario:

The words are space separated
We must match every duplication – non-consecutive ones as well

Pattern: b(w )b(?=.*1) This regular expression seems challenging but uses some of the concept previously shown. The pattern introduces the concept of word boundaries. A word boundary b mainly checks positions. It matches when a word character (i.e.: abcDE) is followed by a non-word character (Ie: -~,!). Below you can find some example uses of word boundary to make it clearer: – Given the phrase Regular expressions are awesome – The pattern bareb matches are – The pattern w{3}b could match the last three letters of the words: lar, ion, are, ome The expression above could be explained as:

Match every word character followed by a non-word character (in our case space)
Check if the matched word is already present or not

Below you will find the explanation for each sub pattern:

b word boundary
capturing group ([w] ) matches any word character a-zA-Z0-9_
b word boundary
(?=.*1) positive lookahead assert that the following can be matched:
- .* matches any character (except newline)
- 1 matches same text as first capturing group

The expression will make more sense if we return all the matches instead of returning only the first one. See the PHP function preg_match_all for more information.

Final thoughts

Regular expressions are double-edged swords. The more complexity is added, the more difficult it is to solve the problem. That’s why, sometimes, it’s hard to find a regular expression that will match all the cases, and it’s better to use several smaller regex instead. Having a good scenario of the problem could be very helpful, and will allow you to start thinking of the character range, constraints, assertions, repetitions, optional values, etc. Paying more attention to group captures will make the matches useful for further processing. Feel free to improve the expressions in the examples, and let us know how you do!

Useful resources

Below you can find further information and resources to help your regex skills grow. Feel free to add a comment to the article if you find something useful that isn’t listed.

Lea Verou – /Reg(exp){2}lained/: Demystifying Regular Expressions

https://www.youtube.com/watch?v=EkluES9Rvak

PHP libraries

Name Description RegExpBuilder Creates regex using human-readable chains of methods NooNooFluentRegex Builds Regex expressions using fluent setters and English language terms like above HoaRegex Provides tools to analyze regex and generate strings Regex reverse Given a regular expression will generate a string

Websites

URL Description regex101.com PCRE online regex tester regextester.com PCRE online regex tester rexv.org PCRE online regex tester debuggex.com Supports PCRE and provides a very useful visual regex debugger regexper.com Javascript style regex, but useful for debug phpliveregex.com Online tester for preg functions regxlib.com Database of regular expressions ready to use regular-expressions.info Regex tutorials, books review, examples

Books

Title Description Author Editor Mastering Regular Expressions The must have regex book Jeffrey Friedl O’Reilly Regular Expression Pocket Reference Regular Expressions for Perl, Ruby, PHP, Python, C, Java and .NET Tony Stubblebine O’Reilly

Frequently Asked Questions (FAQs) about Regular Expressions (Regex)

What are some practical applications of Regular Expressions (Regex)?

Regular expressions (Regex) are incredibly versatile and can be used in a variety of practical applications. They are commonly used in data validation to ensure that user input matches a specific format, such as an email address or phone number. They can also be used in web scraping to extract specific pieces of information from a webpage. In addition, Regex can be used in text processing for tasks such as finding and replacing specific strings of text, splitting a string into an array of substrings, and more.

How can I create complex Regular Expressions (Regex)?

Creating complex regular expressions involves understanding and combining various Regex components. These include literals, character classes, quantifiers, and metacharacters. By combining these components in different ways, you can create regular expressions that match a wide variety of patterns. For example, you could create a regular expression that matches email addresses, phone numbers, or URLs.

What are some common mistakes to avoid when using Regular Expressions (Regex)?

Some common mistakes to avoid when using regular expressions include overusing or misusing certain components, such as the dot (.) or asterisk (*), which can lead to unexpected results. Another common mistake is not properly escaping special characters when they are meant to be interpreted literally. Additionally, it’s important to remember that regular expressions are case-sensitive by default, so you need to use the appropriate flags if you want to ignore case.

How can I test my Regular Expressions (Regex)?

There are several online tools available that allow you to test your regular expressions. These tools typically allow you to enter a regular expression and a test string, and then they highlight the parts of the test string that match the regular expression. This can be a great way to debug your regular expressions and ensure they are working as expected.

Can Regular Expressions (Regex) be used in all programming languages?

Most modern programming languages support regular expressions in some form. However, the specific syntax and features supported can vary between languages. For example, JavaScript, Python, and Ruby all support regular expressions, but they each have their own unique syntax and features.

What are the performance implications of using Regular Expressions (Regex)?

While regular expressions can be incredibly powerful, they can also be resource-intensive if not used properly. Complex regular expressions can take a long time to execute, especially on large strings of text. Therefore, it’s important to use regular expressions judiciously and to optimize them as much as possible.

How can I optimize my Regular Expressions (Regex)?

There are several strategies for optimizing regular expressions. These include avoiding unnecessary quantifiers, using non-capturing groups when you don’t need the matched text, and using character classes instead of alternation where possible. Additionally, some regular expression engines offer optimization features, such as lazy quantifiers, that can improve performance.

What are some resources for learning more about Regular Expressions (Regex)?

There are many resources available for learning more about regular expressions. These include online tutorials, books, and interactive learning platforms. Additionally, many programming languages have extensive documentation on their regular expression syntax and features.

Can Regular Expressions (Regex) be used to parse HTML or XML?

While it’s technically possible to use regular expressions to parse HTML or XML, it’s generally not recommended. This is because HTML and XML have a nested structure that can be difficult to accurately capture with regular expressions. Instead, it’s usually better to use a dedicated HTML or XML parser.

What are some alternatives to Regular Expressions (Regex)?

While regular expressions are incredibly powerful, they are not always the best tool for the job. Depending on the task at hand, you might be better off using a different approach. For example, for simple string manipulation tasks, you might be able to use built-in string methods instead of regular expressions. For parsing HTML or XML, you would typically use a dedicated parser. And for complex text processing tasks, you might want to consider using a natural language processing library.

The above is the detailed content of Demystifying Regex with Practical Examples. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)

1 months ago By 尊渡假赌尊渡假赌尊渡假赌

R.E.P.O. Best Graphic Settings

1 months ago By 尊渡假赌尊渡假赌尊渡假赌

Assassin's Creed Shadows: Seashell Riddle Solution

2 weeks ago By DDD

R.E.P.O. How to Fix Audio if You Can't Hear Anyone

1 months ago By 尊渡假赌尊渡假赌尊渡假赌

R.E.P.O. Chat Commands and How to Use Them

1 months ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Where is the login entrance for gmail email?

7540

CakePHP Tutorial

1381

What is the format of the account name of steam

win11 activation key permanent

nyt connections hints and answers

Related knowledge

Alipay PHP SDK transfer error: How to solve the problem of 'Cannot declare class SignData'? Apr 01, 2025 am 07:21 AM

Alipay PHP...

Explain JSON Web Tokens (JWT) and their use case in PHP APIs. Apr 05, 2025 am 12:04 AM

JWT is an open standard based on JSON, used to securely transmit information between parties, mainly for identity authentication and information exchange. 1. JWT consists of three parts: Header, Payload and Signature. 2. The working principle of JWT includes three steps: generating JWT, verifying JWT and parsing Payload. 3. When using JWT for authentication in PHP, JWT can be generated and verified, and user role and permission information can be included in advanced usage. 4. Common errors include signature verification failure, token expiration, and payload oversized. Debugging skills include using debugging tools and logging. 5. Performance optimization and best practices include using appropriate signature algorithms, setting validity periods reasonably,

Explain the concept of late static binding in PHP. Mar 21, 2025 pm 01:33 PM

Article discusses late static binding (LSB) in PHP, introduced in PHP 5.3, allowing runtime resolution of static method calls for more flexible inheritance.Main issue: LSB vs. traditional polymorphism; LSB's practical applications and potential perfo

Framework Security Features: Protecting against vulnerabilities. Mar 28, 2025 pm 05:11 PM

Article discusses essential security features in frameworks to protect against vulnerabilities, including input validation, authentication, and regular updates.

Describe the SOLID principles and how they apply to PHP development. Apr 03, 2025 am 12:04 AM

The application of SOLID principle in PHP development includes: 1. Single responsibility principle (SRP): Each class is responsible for only one function. 2. Open and close principle (OCP): Changes are achieved through extension rather than modification. 3. Lisch's Substitution Principle (LSP): Subclasses can replace base classes without affecting program accuracy. 4. Interface isolation principle (ISP): Use fine-grained interfaces to avoid dependencies and unused methods. 5. Dependency inversion principle (DIP): High and low-level modules rely on abstraction and are implemented through dependency injection.

Customizing/Extending Frameworks: How to add custom functionality. Mar 28, 2025 pm 05:12 PM

The article discusses adding custom functionality to frameworks, focusing on understanding architecture, identifying extension points, and best practices for integration and debugging.

How to send a POST request containing JSON data using PHP's cURL library? Apr 01, 2025 pm 03:12 PM

Sending JSON data using PHP's cURL library In PHP development, it is often necessary to interact with external APIs. One of the common ways is to use cURL library to send POST�...

How to automatically set permissions of unixsocket after system restart? Mar 31, 2025 pm 11:54 PM

How to automatically set the permissions of unixsocket after the system restarts. Every time the system restarts, we need to execute the following command to modify the permissions of unixsocket: sudo...

See all articles