PHP regular expression syntax summary-PHP Tutorial-php.cn

Original text: http://bbs.chinaunix.net/forum.php?mod=viewthread&tid=4101636

Using regular expressions well will often achieve twice the result with half the effort. The following is a syntax summary and detailed introduction to PHPregular expressions. Friends who need it can come and refer to it.
First, let’s take a look at two special characters: '^' and '$'. They are used to match the beginning and end of the string respectively. Here are examples of each
"^The": matches " The string starting with "The";
"of despair$": matches the string ending with "of despair";
"^abc$": matches the string starting with abc and ending with abc, actually Only abc matches it
"notice": matches the string containing notice
You can see that if you don't use the two characters we mentioned (the last example), that is, the pattern (regular expression ) can appear anywhere in the string being tested, you have not locked it to both sides
There are also several characters '*', '+', and '?', they are used to indicate that a character can appear The number or order of . They respectively represent: "zero or more", "one or more", and "zero or one." Here are some examples:
"ab*": Matches the string a and 0 or more A string composed of multiple b's ("a", "ab", "abbb", etc.);
"ab+": Same as above, but with at least one b ("ab", "abbb", etc. );
"ab?": Matches 0 or one b;
"a?b+$": Matches a string ending with one or 0 a plus one or more b.
You You can also limit the number of characters appearing in curly brackets, for example
"ab{2}": matches an a followed by two b (no less) ("abb" PHP<img src=

;
"ab{2,}": at least Change two b("abb", "abbbb", etc.);
"ab{3,5}": 2-5 b("abbb", "abbbb", or "abbbbb" PHP<img src=

.
You also need to pay attention to you must always specify (i.e, "{0,2}", not "{,2}" PHP<img src=

. Likewise, you must note, '*', '+', and '?' are the same as the following three range annotations, "{0,}", "{1,}", and "{0,1}" respectively.
Now put a certain number of characters into parentheses, for example:
"a(bc)*": Match a followed by 0 or one "bc";
"a(bc){1 ,5}": one to five "bc."
and one character '│', equivalent to OR operation:
"hi│hello": match strings containing "hi" or "hello" ;
"(b│cd)ef": Matches strings containing "bef" or "cdef";
"(a│b)*c": Matches strings containing - multiple (including 0) a or b, followed by a string of c
;
A dot ('.') can represent all single characters:
"a.[0-9]": an a Followed by a character and then a number (strings containing such a string will be matched, this bracket will be omitted in the future)
"^.{3}$": ends with three characters.
square brackets The content only matches a single character
"[ab]": matches a single a or b (same as "a│b");
"[a-d]": matches 'a' to 'd ' (same effect as "a│b│c│d" and "[abcd]");
"^[a-zA-Z]": matches strings starting with letters
"[0-9]%": Matches strings containing x%
", [a-zA-Z0-9]$": Matches strings ending with a comma followed by a number or letter
You can also list the characters you don’t want in square brackets. You just need to use '^' as the beginning inside the brackets (i.e., "%[^a-zA-Z]%" matches two percentages There is a non-letter string in the number).
In order to be able to explain, but when "^.[$()│*+?{/" is used as a character with special meaning, you must add ' in front of these characters ', and in php3 you should avoid using / at the beginning of the pattern. For example, the regular expression "(/$│?[0-9]+" should be called ereg("(//$│ ?[0-9]+", $str) (I don’t know if it is the same in php4)
Don’t forget that the characters inside the square brackets are exceptions to this rule - inside the square brackets, all special characters, including (''), will lose their special properties (i.e., "[*/+?{}.]" matches strings containing these characters). Also, as the regx manual tells us: "If in the list If it contains ']', it is best to put it as the first character in the list (maybe after '^'). If it contains '-', it is best to put it at the front or last, or Or the second end point of a range (i.e. [a-d-0-9] with a '-' in the middle will work.
For completeness, I should relate to collating sequences, character classes, and equivalence classes. But I'm I don’t want to go into too much detail about these aspects, and they don’t need to be covered in the following article. You can get more information in the regex man pages.
How to build a pattern to match the input of currency amount
Okay , now we are going to use what we have learned to do something useful: build a matching pattern to check whether the input information is a number representing money.We think there are four ways to represent the amount of money: "10000.00" and "10,000.00", or without a decimal part, "10000" and "10,000". Now let's start building this matching pattern:
^[1-9 ][0-9]*$
This means that all variables must start with a non-0 number. But this also means that a single "0" cannot pass the test. Here is the solution:
^(0 │[1-9][0-9]*)$
"Only 0 and numbers that do not start with 0 match", we can also allow a negative sign before the number:
^(0│ -?[1-9][0-9]*)$
This is: "0 or a number starting with 0 that may have a negative sign in front of it." Okay, okay now let's not be so strict. , is allowed to start with 0. Now let's drop the negative sign, since we don't need it when representing coins. We now specify a pattern to match the decimal part:
^[0-9]+(/.[ 0-9]+)?$
This implies that the matching string must start with at least an Arabic digit. But note that in the above pattern "10." is not matched, only "10" and "10.2" Yes. (Do you know why)
^[0-9]+(/.[0-9]{2})?$
We specified above that there must be two decimal places after the decimal point. If you think this is the case Too harsh, you can change it to:
^[0-9]+(/.[0-9]{1,2})?$
This will allow one or two characters after the decimal point. Now we Adding commas (every third digit) for readability, we can represent it like this:
^[0-9]{1,3}(,[0-9]{3})*(/ .[0-9]{1,2})?$
Don't forget the plus sign '+' which can be replaced by the multiplication sign '*' if you want to allow blank strings to be entered (why?). Also don't forget the backslash '/' in php strings Errors may occur (very common errors). Now that we can confirm the string, we now remove all commas str_replace(",", "", $money) and then treat the type as double and then we You can do mathematical calculations through it.
Construct a regular expression for checking email
Okay, let us continue to discuss how to verify an email address. There are three parts in a complete email address: POP3 username ( Everything to the left of '@'), '@', the server name (that's the rest). Usernames can contain uppercase and lowercase letters, Arabic numerals, periods ('.'), minus signs ('-'), and underscores ('_'). Server names also comply with this rule, except of course the underscore.
Now, the beginning and end of the username cannot be a period. The same is true for the server. Also you cannot have two consecutive periods between them There is at least one character, so now let’s take a look at how to write a matching pattern for the username:
^[_a-zA-Z0-9-]+$
The existence of periods is not allowed yet. Let’s put it Add:
^[_a-zA-Z0-9-]+(/.[_a-zA-Z0-9-]+)*$
The above means: "With at least one canonical character (Except. unexpected), followed by 0 or more strings starting with a dot."
To simplify it a bit, we can use eregi() instead of ereg(). eregi() is not case-sensitive, we There is no need to specify two ranges "a-z" and "A-Z" – just one:
^[_a-z0-9-]+(/.[_a-z0-9-]+)* The server name after $
is the same, but the underscore must be removed:
^[a-z0-9-]+(/.[a-z0-9-]+)*$
Done. Now Just use "@" to connect the two parts:
^[_a-z0-9-]+(/.[_a-z0-9-]+)*@[a-z0-9-]+(/ .[a-z0-9-]+)*$
This is the complete email authentication matching mode, just call
eregi('^[_a-z0-9-]+(/.[_a -z0-9-]+)*@[a-z0-9-]+(/.[a-z0-9-]+)*$ ',$eamil)
You can get whether it is an email
Other uses of regular expressions
Extracting strings
ereg() and eregi() have a feature that allows users to extract part of a string through regular expressions (you can read the specific usage Manual). For example, we want to extract the filename from path/URL – the following code is what you need:
ereg("([^///]*)$", $pathOrUrl, $regs);
echo $regs[1];
Advanced substitutions
ereg_replace() and eregi_replace() are also very useful: If we want to replace all separated negative signs with commas:
ereg_replace(" [ /n/r/t]+", ",", trim($str));
PHP is widely used in background CGI development of the Web, usually to obtain some kind of result after user data. But if the data entered by the user is incorrect, problems will arise. For example, someone's birthday is "February 30"! So how should we check whether the summer vacation is correct? regular expression support has been added to PHP, allowing us to perform data matching very conveniently.
2 What is Regular Expression:
Simply put, Regular Expression is a powerful tool that can be used for pattern matching and replacement. Traces of regular expressions can be found in almost all software tools based on UNIX/LINUX systems, such as Perl or PHP scripting language. In addition, JavaScript, a client-side scripting language, also provides support for regular expressions. Now regular expressions have become a common concept and tool and are widely used by various technical personnel.
There is this saying on a certain Linux website: "If you ask a Linux enthusiast what he likes most, he may answer regular expressions; if you ask him what he is most afraid of, besides tedious installation and configuration, he will Definitely say regular expressions."
As mentioned above, regular expressions look very complicated and scary, Jinan website construction http://www.geilijz.comWhat I want to tell you is that most PHP beginners Everyone will skip here and continue learning below, but regular expressions in PHP can use pattern matching to find strings that meet the conditions, determine whether the strings meet the conditions, or replace the strings that meet the conditions with a specified string. It would be a pity not to learn the powerful functions...
3 The basic syntax of regular expression:
A regular expression, divided into three parts: delimiter, expression and modifier .
The delimiter can be any character except special characters(such as "/!", etc.). The commonly used delimiter is "/". The expression consists of some special characters(special charactersdetails). (see below) and non-special strings, such as "[a-z0-9_-]+@[a-z0-9_-.]+" can match a simple email string. The modifier is used to turn on. Or turn off a certain function/mode. Here is an example of a complete regular expression:
/hello.+?hello/is
The above regular expression"/" is the separator, two The one between the "/" is the expression, and the string "is" after the second "/" is the modifier.
If there is a delimiter in the expression, you need to use the escape symbol "/". , such as "/hello.+?//hello/is". In addition to being used as delimiters, escape symbols can also be used for special characters. All special characters composed of letters need to be escaped with "/". For example, "/d" represents all numbers.
4 Special characters in regular expressions:
Special characters in regular expressions are divided into metacharacters, positioning characters, etc. Metacharacters are a type of characters with special meaning in regular expressions, which are used to describe the way in which their leading characters (i.e., the characters before the metacharacters) appear in the matched object. The metacharacters themselves are single. Characters, but different or identical metacharacters can be combined to form large metacharacters.
Metacharacters:
Braces: Braces are used to precisely specify the number of occurrences of matching metacharacters, such as "/pre{1, 5}/" means that the matching object can be "pre", "pree", "preeeee", so that a string of 1 to 5 "e" appears after "pr". Or "/pre{,5}/" Represents pre appearing between 0 and 5 times.
Plus sign: The "+" character is used to match the character before the metacharacter appearing one or more times. For example, "/ac+/" means that the matched object can be "act." ", "account", "acccc" and other strings with one or more "c" appearing after "a". "+" is equivalent to "{1,}".
Asterix: "*" character is used to match zero or more occurrences of the character before the metacharacter. For example, "/ac*/" means that the matched object can be "app", "acp", "accp" and other strings with zero or more "c" appearing after "a". "*" is equivalent to "{0,}".
Question mark: The "?" character is used to match zero or one occurrence of the character before the metacharacter. For example, "/ac?/" means that the matching object can be "a", "acp", "acwp", such that zero or one "c" string appears after "a". "?" also plays a very important role in regular expressions, that is, "greedy mode".
There are two very important special characterswhich are "[ ]". They can match characters that appear in "[]". For example, "/[az]/" can match a single character "a" or "z"; if the above expression is changed to "/[a-z]/" , you can match any single lowercase letter, such as "a", "b", etc.
If "^" appears in "[]", it means that this expression does not match the characters appearing in "[]", such as "/[^a-z]/" does not match any lowercase letters! And regular expression gives several default values of "[]":
[:alpha:]: matches any letters
[:alnum:]: matches any letters and numbers
[: digit:]: matches any digit
[:space:]: matches the space character
[:upper:]: matches any uppercase letter
[:lower:]: matches any lowercase letter
[: punct:]: matches any punctuation mark
[ PHP<img src=

digit:]: Matches any hexadecimal digit
In addition, the following special charactershave the following meanings after escaping the escape symbol "/":
s: Matches a single Space character
S: used to match all characters except a single space character.
d: Used to match numbers from 0 to 9, equivalent to "/[0-9]/".
w: Used to match letters, numbers or underscore characters, equivalent to "/[a-zA-Z0-9_]/".
W: used to match all characters that do not match w, equivalent to "/[^a-zA-Z0-9_]/".
D: used to match any non-decimal numeric characters.
.: Used to match all characters except newline characters. If modified by the modifier "s", "." can represent any character.
Using the above special characters can easily express some complicated pattern matching. For example, "//d0000/" can use the regular expression above to match integer strings ranging from more than 10,000 to less than 100,000.
Positioning characters:
Positioning characters are another very important type of characters in regular expressions. Its main function is to describe the position of characters in the matching object.
^: Indicates that the matching pattern appears at the beginning of the matching object (different from "[]")
$: Indicates that the matching pattern appears at the end of the matching object
Space: Indicates that the matching pattern appears At one of the two boundaries between the beginning and the end
"/^he/": can match strings starting with the "he" character, such as hello, height, etc.;
"/he$/": can Matches strings ending with "he" characters, i.e. she, etc.;
"/ he/": starts with a space, has the same effect as ^, matches strings starting with he;
"/he /": starts with a space End, has the same function as $, matching strings ending with he;
"/^he$/": Indicates that it only matches the string "he".
Brackets:
Regular expressionsIn addition to user matching, you can also use brackets "()" to record the required information, store it, and read it for subsequent expressions. For example:
/^([a-zA-Z0-9_-]+)@([a-zA-Z0-9_-]+)(.[a-zA-Z0-9_-])$/
is the user name that records the email address, and the server address of the email address (in the form of service@geilijz.com or something like that). If you want to read the recorded string later, you just need to escape it with " character + record order" to read. For example, "/1" is equivalent to the first "[a-zA-Z0-9_-]+", "/2" is equivalent to the second one ([a-zA-Z0-9_-]+), "/ 3" is the third one (.[a-zA-Z0-9_-]). But in PHP, "/" is a special character that needs to be escaped, so "" should be written as "//1" in the PHP expression.
Other special symbols:
"|": The or symbol "|" is the same as the or in PHP, but it is just one "|" instead of two "||" in PHP! This means that it can be a certain character or another string. For example, "/abcd|dcba/" may match "abcd" or "dcba".
5 Greedy mode:
As mentioned before, "?" among metacharacters also plays an important role, that is, "greedy mode". What is "greedy mode"?
For example, if we want to match a string that starts with the letter "a" and ends with the letter "b", but the string that needs to be matched contains many "b"s after "a", such as "a bbbbbbbbbbbbbbbbb", then regular expression Will the formula match the first "b" or the last "b"? If you use greedy mode, the last "b" will be matched, otherwise only the first "b" will be matched.
The expression using greedy mode is as follows:
/a.+?b/
/a.+b/U
The expression not using greedy mode is as follows:
/a.+b/
A modifier U is used above, see the section below for details.
6 Modifiers:
The modifiers in regular expression can change many characteristics of the regular expression, making regular expression more suitable for your needs (note: modifiers are case-sensitive, This means "e" is not equal to "E"). The modifiers in the regular expression are as follows:
i: If "i" is added to the modifier, the regular expression will cancel the case sensitivity, that is, "a" and "A" are the same.
m: The default regular start "^" and end "$" are only for regular strings. If "m" is added to the modifier, then the start and end will refer to each line of the string: the beginning of each line It's "^" and ends with "$".
s: If "s" is added to the modifier, the default "." representing any character other than the newline character will become any character, including the newline character!
x: If this modifier is added, whitespace characters in the expression will be ignored unless it has been escaped.
e: This modifier is only useful for replacement, which means it is used as PHP code in replacement.
A: If this modifier is used, the expression must be the beginning of the matched string. For example, "/a/A" matches "abcd".
E: Contrary to "m", if this modifier is used, then "$" will match the absolute end of the string, not before the newline character. This mode is turned on by default.
U: It has the same function as the question mark, and is used to set the "greedy mode".
7 PCRE related regular expressionfunctions:
PHP’s Perl is compatible with regular expressions provides multiple functions, divided into pattern matching, replacement and number of matches, etc.:
1, preg_match :
Function format: int preg_match(string pattern, string subject, array [matches]);
This function will use pattern expression in string to match. If [regs] is given, string will be Recorded in [regs][0], [regs][1] represents the first string recorded using brackets "()", [regs][2] represents the second string recorded, in this way analogy. preg will return "true" if a matching pattern is found in string, otherwise it will return "false".
2. preg_replace:
Function format: mixed preg_replace(mixed pattern, mixed replacement, mixed subject);
This function will replace all strings in string that match the expression pattern with expression replacement. If the replacement needs to contain some characters of the pattern, you can use "()" to record it. In the replacement, you just need to use "/1" to read.
3. preg_split:
Function format: array preg_split(string pattern, string subject, int [limit]);
This function is the same as the function split, the only difference is that split can use simple regular expressions to split matching strings, while preg_split uses fully Perl compatible regular expressions. The third parameter limit represents how many qualified values are allowed to be returned.
4. preg_grep:
Function format: array preg_grep(string patern, array input);
This function is basically the same as preg_match, but preg_grep can match all elements in the given array input and return a new array.
Here is an example. For example, we want to check whether the format of the email address is correct:
Copy the code as follows:

function emailIsRight($email) {
if (preg_match ("^[_/.0-9a-z-]+@([0-9a-z][0-9a-z-]+/.)+[a-z]{2,3}$",$email ;
if(!emailIsRight('y10k@fffff')) echo 'Incorrect
';
?>
The above program will output "Correct
Incorrect".
8. The difference between Perl compatible regular expressions and Perl/Eregregular expressions in PHP:
Although it is called "Perl compatible regular expressions", it is different from Perl's regular expressions Compared with PHP, there are still some differences. For example, the modifier "G" represents all matches in Perl, but there is no support for this modifier in PHP.
There is also the difference from the ereg series of functions. ereg is also a regular expression function provided in PHP, but compared with preg, it is much weaker.
1. Separators and modifiers are not required and cannot be used in ereg, so the function of ereg is much weaker than preg.
2. About ".": The dot in the regular expression is usually all characters except the newline character, but the "." in the ereg is any character, including the newline character! If you want "." to include newline characters in preg, you can add "s" to the modifier.
3. ereg uses greedy mode by default and cannot be modified. This brings trouble to many replacements and matchings.
4. Speed: This may be a question that many people are concerned about. Is the powerful function of preg in exchange for speed? Don’t worry, preg is much faster than ereg. The author made a program test:
time test:
PHP code:
Copy the code as follows:

echo " PHP<img src=

reg_replace time:";
$start = time();
for($i=1;$i<=100000;$i++) {
$str = "sssssssssssssssssssssssss"; used
preg_replace( "/s/","",$str);
}
$ended = time()-$start;
echo $ended;
echo "
ereg_replace used time:" ;
$start = time();
for($i=1;$i<=100000;$i++) {
$str = "sssssssssssssssssssssssss";
ereg_re place("s", "",$str);
}
$ended = time()-$start;
echo $ended;
echo "
str_replace used time:";
$start = time();
for($i=1;$i<=100000;$i++) {
$str = "ssssssssssssssssssssssssssss";
str_replace("s","",$str) ;
}
$ended = time()-$start;
echo $ended; ereg_replace used time :15
str_replace used time:2
str_replace is very fast because it does not require matching, and preg_replace is much faster than ereg_replace
9. Regarding PHP3.0’s support for preg:
Preg support is added by default in PHP 4.0, but it is not available in 3.0. If you want to use the preg function in 3.0, you must load the php3_pcre.dll file and just add "extension = php3_pcre." in the extension section of php.ini. dll" and then restart PHP!
In fact,
regular expressions are also commonly used in the implementation of UbbCode. Many PHP forums have used this method, but the specific code is relatively long.
The above has introduced a summary of PHP regular expression syntax, including special characters and regular expressions. I hope it will be helpful to friends who are interested in PHP tutorials.