php regular expression
Universal pattern
Delimiter, usually "/" is used as the delimiter to start and end, but "#" can also be used.
When to use "#"? Usually when there are a lot of "/" characters in your string, because such characters need to be escaped during regular expressions, such as uri.
The code using the "/" delimiter is as follows.
The code is as follows | Copy code | ||||
?$regex = '/^http:/ /([w.]+)/([w]+)/([w]+).html$/i';
|
$matches[0] in preg_match will contain the string matching the entire pattern.
The code using the "#" delimiter is as follows. At this time, "/" will not be escaped!
?$regex = '#^http://([w.]+)/([w]+)/([w]+).html$#i';
$str = 'http://www.youku.com/show_page/id_ABCDEFG.html';
$matches = array();
if(preg_match($regex, $str, $matches)){
var_dump($matches);
}
echo "n";
Modifiers: used to change the behavior of regular expressions.
The last "i" in ('/^http://([w.]+)/([w]+)/([w]+).html/i') we see is Modifier means ignoring case, and another one we often use is "x" which means ignoring spaces.
Contribute code:
?$regex = '/HELLO/';
$str = 'hello word';
$matches = array();
if(preg_match($regex, $str, $matches)){
echo 'No i:Valid Successful!',"n";
}
if(preg_match($regex.'i', $str, $matches)){
echo 'YES i:Valid Successful!',"n";
}
Character field:[w]The part expanded with square brackets is the character field.
Qualifier: Such as [w]{3,5} or [w]* or [w]+. The symbols after [w] all represent qualifiers. The specific meaning is now introduced.
{3,5} means 3 to 5 characters. {3,} is more than 3 characters, {,5} is up to 5 characters, and {3} is three characters.
* represents 0 to multiple
+ means 1 to more.
caret
^:
& gt; placed in the character domain (such as: [^w]) indicate the negative (excluding meaning) - "reverse selection"
can be placed before the expression to start with the current character. (/^n/i, means starting with n).
Note, we often call "" escape character". Used to escape some special symbols, such as ".", "/"
Delimiter: The form of regular expression is generally as follows:
/love/
The part between the "/" delimiters is the pattern that will be matched in the target object.
Metacharacters: refer to those special characters with special meaning in regular expressions, which can be used to specify the appearance pattern of their leading characters (that is, the characters in front of the metacharacters) in the target object.
The more commonly used metacharacters include: "+", "*", and "?".
The "+" metacharacter stipulates that its leading character must appear one or more times in the target object
The "*" metacharacter stipulates that its leading character must appear zero or consecutive times in the target object,
The "?" metacharacter stipulates that its leading character must appear zero or once in the target object.
Next, let us take a look at the specific applications of regular expression metacharacters.
/fo+/
Because the above regular expression contains the "+" metacharacter (the "o" in front of it is the leading character), it means that one or more letters can appear consecutively after the letter f with "fool", "fo", etc. in the target object. o matches the string.
In addition to metacharacters, users can specify exactly how often a pattern appears in a matched object. For example,
/jim{2,6}/
The above regular expression stipulates that the character m can appear 2-6 times continuously in the matching object. Therefore, the above regular expression can match strings such as jimmy or jimmmmmy.
How to use several other important metacharacters.
s: used to match a single space character, including tab key and newline character;
S: used to match all characters except a single space character;
d: used to match numbers from 0 to 9;
w: used to match letters, numbers or underscore characters;
W: used to match all characters that do not match w;
. : Used to match all characters except newline characters.
(Explanation: We can think of s and S and w and W as inverse operations of each other)
Below, we will look at how to use the above metacharacters in regular expressions through examples.
/s+/
The above regular expression can be used to match one or more space characters in the target object.
In addition to the metacharacters we introduced above, regular expressions also have another unique special character, namely the locator.
Locator: used to specify where the matching pattern appears in the target object.
The more commonly used locators include: "^", "$", "b" and "B".
The "^" locator specifies that the matching pattern must appear at the beginning of the target string
The "$" locator specifies that the matching pattern must appear at the end of the target object
The b locator specifies that the matching pattern must appear at one of the two boundaries
at the beginning or end of the target string
The "B" locator stipulates that the matching object must be located within the two boundaries of the beginning and end of the target string, that is, the matching object cannot be used as the beginning or the end of the target string. Likewise, we
"^" and "$" and "b" and "B" can also be regarded as two sets of locators that are inverse operations of each other. For example:
/^hell/
Because the above regular expression contains the "^" locator, it can match strings starting with "hell", "hello" or "hellhound" in the target object.
/ar$/
Because the above regular expression contains the "$" locator, it can match strings ending with "car", "bar" or "ar" in the target object.
/bbom/
Because the above regular expression pattern starts with the "b" locator, it can match strings that start with "bomb", or "bom" in the target object.
/manb/
Because the above regular expression pattern ends with the "b" locator, it will match any string in the target object that ends with "human", "woman", or "man".
In order to facilitate users to set matching patterns more flexibly, regular expressions allow users to specify a certain range in the matching pattern without being limited to specific characters. For example:
/[A-Z]/
The above regular expression will match any uppercase letter from A to Z.
/[a-z]/
The above regular expression will match any lowercase letter in the range from a to z.
/[0-9]/
The above regular expression will match any number from 0 to 9.
/([a-z][A-Z][0-9])+/
The above regular expression will match any string consisting of letters and numbers, such as "aB0", etc. One thing that users need to pay attention to here is that you can use "()" in regular expressions to combine strings together.
"()" symbol: The contained content must appear in the target object at the same time. Therefore, the above regular expression will not match a string such as "abc" because the last character in "abc" is a letter and not a number.
If we want to implement a regular expression similar to the "OR" operation in programming logic and select any one of multiple different patterns for matching, we can use the pipe character: "|". For example:
/to|too|2/
The above regular expression will match "to", "too", or "2" in the target object.
Negation character: "[^]". Different from the locator "^" we introduced earlier, the negation character "[^]" specifies that the string specified in the pattern cannot exist in the target object. For example:
/[^A-C]/
The above string will match any character in the target object except A, B, and C. Generally speaking, when "^" appears inside "[]", it is regarded as a negative operator; when "^" is located outside "[]", or there is no "[]", it should be regarded as a negative operator. locator.
Finally, when users need to add metacharacters to the regular expression pattern and find their matching objects, they can use
Escape character: "". For example:
/Th*/
The above regular expression will match "Th*" instead of "The" etc. in the target object.
Practical experience introduction
Still have to talk about ^ and $. They are used to match the beginning and end of a string respectively. The following are examples:
"^The": There must be a string of "The" at the beginning;
"of despair$": There must be a string with "of despair" at the end;
Then,
"^abc$": It requires a string starting with abc and ending with abc. In fact, only abc matches;
"notice": Matches strings containing notice;
You can see that if you don't use the two characters we mentioned (the last example), it means that the pattern (regular expression) can appear anywhere in the string being checked, you are not locking it to both sides.
Next, let’s talk about ‘*’ ‘+’ and ‘?’
They are used to represent the number or order in which a character can appear. They represent respectively:
"zero or more" is equivalent to {0,}
"one or more" is equivalent to {1,}
"zero or one." is equivalent to {0,1}
Here are some examples:
"ab*": synonymous with ab{0,}, matching starts with a and can be followed by a string consisting of 0 or N b's ("a", "ab", "abbb", etc.);
"ab+": synonymous with ab{1,}, the same as the above, but there must be at least one b ("ab" "abbb", etc.);
"ab?": synonymous with ab{0,1}, there can be no or only one b;
"a?b+$": Matches a string ending with one or 0 a plus one or more b.
Key points: '*' '+' and '?' only care about the character before it.
You can also limit the number of characters within curly brackets, for example:
"ab{2}": It is required that a must be followed by two b (not one less) ("abb");
"ab{2,}": It is required that there must be two or more b after a (such as "abb" "abbbb", etc.);
"ab{3,5}": It is required that there can be 2-5 b ("abbb", "abbbb", or "abbbbb") after a.
Now we put certain characters into parentheses, for example:
"a(bc)*": matches a followed by 0 or one "bc";
"a(bc){1,5}": one to 5 "bc";
There is also a character ‘|’, which is equivalent to the OR operation:
"hi|hello": matches strings containing "hi" or "hello";
"(b|cd)ef": Matches strings containing "bef" or "cdef";
"(a|b)*c": Matches a string containing multiple (including 0) a or b, followed by a c;
A dot ('.') can represent all single characters, excluding " "
What if you want to match all single characters including " "?
Use the '[ .]' pattern.
"a.[0-9]": an a plus a character plus a number from 0 to 9;
"^.{3}$": ends with three arbitrary characters.
Content enclosed in square brackets only matches a single character
"[ab]": matches a single a or b (same as "a│b");
"[a-d]": matches a single character from 'a' to 'd' (same effect as "a│b│c│d" and "[abcd]");
Generally we use [a-zA-Z] to specify a character in uppercase and lowercase English:
"^[a-zA-Z]": Matches strings starting with uppercase and lowercase letters;
"[0-9]%": matches strings containing x%;
",[a-zA-Z0-9]$": matches a string ending with a comma followed by a number or letter;
You can also list the characters you don't want in square brackets. You just need to use '^' as the beginning of the brackets. "%[^a-zA-Z]%" matches two percent signs, one of which Non-alphabetic string.
Important: When ^ is used at the beginning of a square bracket, it means that the characters in the brackets are excluded.
In order for PHP to interpret it, you must add "" before and after these characters, and escape some characters.
Don't forget that characters inside brackets are exceptions to this rule - inside brackets, all special characters, including ("), will lose their special properties. "[*+?{}.]" matches characters containing these String of characters:
Also, as the regx manual tells us: "If the list contains ']', it is best to use it as the first character in the list (possibly following '^'). If it contains '-', it is best to use Put it at the front or at the end
, or or a '-' in the middle of the second end point of a range [a-d-0-9] will be valid.
After reading the above example, you should understand {n, m}. It should be noted that neither n nor m can be negative integers, and n is always less than m. In this way, it can be matched at least n times and at most m times. For example, "p{1,5}" will match
The first five p
in "pvpppppp"
Let’s talk about words starting with
b The book says that it is used to match a word boundary, that is...for example, 'veb', it can match ve in love but not ve in very
B is just the opposite of b above.
Other uses of regular expressions
Extract string
One feature of ereg() and eregi() allows users to extract part of a string through regular expressions (you can read the manual for specific usage). For example, we want to extract the file name from path/URL, the following code
The code is what you need:
ereg(”([^/]*)$”, $pathOrUrl, $regs);
echo $regs[1];
Advanced substitution
ereg_replace() and eregi_replace() are also very useful, if we want to replace all separated negative signs with commas:
ereg_replace("[ t]+", ",", trim($str));
The following is the quoted content:
preg_match() and preg_match_all()
preg_quote()
preg_split()
preg_grep()
preg_replace()
We can find the specific use of functions through the PHP manual. Here are some regular expressions we have accumulated:
Match action attribute
The following is the quoted content:
The code is as follows | Copy code | ||||
$str = '';
Preg_match_all('/s+action="(?!http:)(.*?)"s/', $str, $match); Print_r($match);
|
Use callback functions in regular expressions
代码如下 | 复制代码 |
/** * replace some string by callback function * */ function callback_replace() { $url = 'http://esfang.house.sina.com.cn'; $str = ''; $str = preg_replace ( '/(?<=saction=")(?!http:)(.*?)(?="s)/e', 'search($url, 1)', $str ); echo $str; } function search($url, $match){ return $url . '/' . $match; } |
The code is as follows | Copy code |
/** <🎜> * replace some string by callback function <🎜> * <🎜> */ <🎜 > function callback_replace() { <🎜> $url = 'http://esfang.house.sina.com.cn'; <🎜> $str = ''; <🎜> $str = preg_replace ( '/(?<=saction=")(?!http:)(.*?)(?="s)/e', 'search($url, 1)', $str ) ; <🎜> <🎜> echo $str; <🎜> } <🎜> <🎜> function search($url, $match){ <🎜> Return $url . '/' . $match; <🎜> } <🎜> |
Regular matching with assertions
The following is the quoted content:
The code is as follows
| Copy code
|
||||||||||||||||||||||||||||||||||
$match = ''; ';
Replace the address in the HTML source code
|
The code is as follows | Copy code |
$form_html = preg_replace ( '/(?<=saction="|ssrc="|shref=")(?!http:|javascript)(.*?)(?="s)/e', 'add_url($url, '1')', $form_html );
http://www.bkjia.com/PHPjc/632170.htmlwww.bkjia.comtruehttp: //www.bkjia.com/PHPjc/632170.htmlTechArticlephp regular expression general pattern delimiter, usually use / as the delimiter to start and end, you can also use#. When to use #? Usually when there are many / characters in your string...