PHP regular expressions are mainly used for pattern segmentation, matching, search and replacement operations on strings. Using regular expressions may not be efficient in some simple environments, so how to better use PHP regular expressions needs to be considered comprehensively.
My introduction to PHP regular expressions originated from an article on the Internet. This article explains the method of using PHP regular expressions from shallow to deep. I think it is a good introductory material, but it still takes a long time to learn it. Personally, in the process of using it, I still keep forgetting it, so I read this article over and over again four or five times. For some of the more difficult knowledge points, it even takes a long time to digest, but as long as you can see If you stick to reading it, you will find that your ability to apply regular rules will be significantly improved.
Definition of PHP regular expression:
A grammatical rule used to describe character arrangement and matching patterns. It is mainly used for pattern segmentation, matching, search and replacement operations of strings.
Regular function in PHP:
There are two sets of regular functions in PHP, both of which have similar functions, namely:
One set is provided by the PCRE (Perl Compatible Regular Expression) library. Functions named with the prefix "preg_";
A set of extensions provided by POSIX (Portable Operating System Interface of Unix). Use functions named with the prefix "ereg_"; (POSIX regular function library is no longer recommended for use since PHP 5.3 and will be removed from PHP 6)
Since POSIX regularization is about to be launched on the historical stage, and the forms of PCRE and perl are similar, it is more convenient for us to switch between perl and php, so here we focus on the use of PCRE regularity.
PCRE regular expression
PCRE stands for Perl Compatible Regular Expression, which means Perl-compatible regular expression.
In PCRE, the pattern expression (regular expression) is usually included between two backslashes "/", such as "/apple/".
Several important concepts in regular expressions are: metacharacters, escapes, pattern units (repetitions), antonyms, references and assertions. These concepts can be easily understood and mastered in the article [1].
Commonly used meta-characters:
Metacharacter description
A matches the atom at the beginning of the string
Z matches the atom at the end of the string
b matches the boundary of the word /bis/ matches the string whose head is is /isb/ matches the string whose tail is is /bisb/ delimits
B matches any character except word boundaries /Bis/ matches "is" in the word "This"
d matches a number; equivalent to [0-9]
D matches any character except numbers; equivalent to [^0-9]
w matches an English letter, number or underscore; equivalent to [0-9a-zA-Z_]
W matches any character except English letters, numbers and underscores; equivalent to [^0-9a-zA-Z_]
s matches a whitespace character; equivalent to [ftv]
S matches any character except whitespace characters; equivalent to [^ftv]
f matches a form feed equivalent to x0c or cL
Matches a newline character; equivalent to x0a or cJ
Matches a carriage return equivalent to x0d or cM
t matches a tab character; equivalent to x09 or cl
v matches a vertical tab character; equivalent to x0b or ck
oNN matches an octal number
xNN matches a hexadecimal number
cC matches a control character
Pattern Modifiers:
Pattern modifiers are especially used in ignoring case and matching multiple lines. Mastering this modifier can often solve many problems we encounter.
i - can match both uppercase and lowercase letters
M - treat string as multiple lines
S - Treat the string as a single line, and treat newlines as ordinary characters, making "." match any character
X - Whitespace in the pattern is ignored
U - matches the nearest string
e - Use the replaced string as an expression
Format: /apple/i matches "apple" or "Apple", etc., ignoring case. /i
PCRE pattern unit:
//1 Extract the first attribute
/^d{2} ([W])d{2}1d{4}$ matches strings such as "12-31-2006", "09/27/1996", and "86 01 4321". But the above regular expression does not match the format of "12/34-5678". This is because the result "/" of pattern "[W]" has already been stored. When the next position "1" refers to , the matching pattern is also the character "/".
Use the non-storage mode unit "(?:)" when there is no need to store the matching results
For example /(?:a|b|c)(D|E|F)1g/ will match "aEEg". In some regular expressions, it is necessary to use non-storage mode units. Otherwise, the order of subsequent references needs to be changed. The above example can also be written as /(a|b|c)(C|E|F)2g/.
PCRE regular expression function:
preg_match() and preg_match_all()
preg_quote()
preg_split()
preg_grep()
preg_replace()
We can find the specific use of functions through the PHP manual. Here are some regular expressions we have accumulated:
Match action attributes
$str = '';
$match = '';
preg_match_all('/s+action="(?!http:)(.*?)"s/', $str, $match);
print_r($match);
Using callback functions in regular expressions
/**
* replace some string by callback function
*
*/
Function callback_replace() {
$url = 'http://esfang.house.sina.com.cn';
$str = '';
$str = preg_replace ( '/(?<=saction=")(?!http:)(.*?)(?="s)/e', 'search($url, 1)', $str );
echo $str;
}
Function search($url, $match){
return $url . '/' . $match;
}
Regular matching with assertions
$match = '';
$str = 'xxxxxx.com.cn bold font
paragraph text
;
preg_match_all ( '/(?<=<(w{1})>).*(?=1>)/', $str, $match );
echo "Match content in HTML tags without attributes:";
print_r ( $match );
Replace the address in the HTML source code
$form_html = preg_replace ( '/(?<=saction="|ssrc="|shref=")(?!http:|javascript)(.*?)(?="s)/e', 'add_url($url, '1')', $form_html );
Finally, although the regular tool is powerful, in terms of efficiency and writing time, sometimes it may not be more direct than explode. For some urgent or undemanding tasks, a simple and crude method may be better.
As for the execution efficiency between the two series of preg and ereg, I have seen an article saying that preg is faster. Specifically, because ereg is not used much, and it is about to be launched on the historical stage, I will add another person. I prefer the PCRE method, so I will not make a comparison. Friends who are familiar with it can express their opinions. Thank you.