There are two sets of regular expression function libraries in PHP. One set is provided by the PCRE (Perl Compatible Regular Expression) library. The PCRE library implements regular expression pattern matching using the same syntax rules as Perl, using functions named with the "preg_" prefix. The other set is provided by the POSIX (Portable Operation System interface) extension library. POSIX extended regular expressions are defined by POSIX 1003.2 and generally use functions named with the "ereg_" prefix.
The functions of the two function libraries are similar, but the execution efficiency is slightly different. Generally speaking, to achieve the same function, the efficiency of using the PCRE library is slightly superior. Its use is described in detail below.
6.3.1 Regular expression matching
1. preg_match()
Function prototype: int preg_match (string $pattern, string $content [, array $matches])
preg_match () function searches the $content string for the regular expression given by $pattern. Matching content. If $matches is provided, the matching results are placed in it. $matches[0] will contain the text that matches the entire pattern, $matches[1] will contain the first captured match of the pattern element enclosed in parentheses, and so on. This function only performs one match and ultimately returns the number of matching results of 0 or 1. Listing 6.1 shows a code example for the preg_match() function.
Code 6.1 Date and time matching
<?php //需要匹配的字符串。date函数返回当前时间 $content = "Current date and time is ".date("Y-m-d h:i a").", we are learning PHP together."; //使用通常的方法匹配时间 if (preg_match ("/\d{4}-\d{2}-\d{2} \d{2}:\d{2} [ap]m/", $content, $m)) { echo "匹配的时间是:" .$m[0]. "\n"; } //由于时间的模式明显,也可以简单的匹配 if (preg_match ("/([\d-]{10}) ([\d:]{5} [ap]m)/", $content, $m)) { echo "当前日期是:" .$m[1]. "\n"; echo "当前时间是:" .$m[2]. "\n"; } ?>
This is a simple dynamic text string matching example. Assuming that the current system time is "13:25 on August 17, 2006", the following content will be output.
The matching time is: 2006-08-17 01:25 pm
The current date is: 2006-08-17
The current time is: 01:25 pm
2. ereg() and eregi()
ereg() is the matching function for regular expressions in the POSIX extension library. eregi() is a case-ignoring version of the ereg() function. Both have similar functions to preg_match, but the function returns a Boolean value indicating whether the match was successful or not. It should be noted that the first parameter of the POSIX extension library function accepts a regular expression string, that is, no delimiter is required. For example, Listing 6.2 is a method for checking the security of file names.
Code 6.2 Security check of file names
<?php $username = $_SERVER['REMOTE_USER']; $filename = $_GET['file']; //对文件名进行过滤,以保证系统安全 if (!ereg('^[^./][^/]*$', $userfile)) { die('这不是一个非法的文件名!'); } //对用户名进行过滤 if (!ereg('^[^./][^/]*$', $username)) { die('这不是一个无效的用户名'); } //通过安全过滤,拼合文件路径 $thefile = "/home/$username/$filename"; ?>
Normally, using the Perl-compatible regular expression matching function perg_match() will be faster than using ereg() or eregi(). If you just want to find whether a string contains a certain substring, it is recommended to use the strstr() or strpos() function.
3. preg_grep()
Function prototype: array preg_grep (string $pattern, array $input)
The preg_grep() function returns an array, which includes the cells in the $input array that match the given $pattern pattern. Preg_grep() also only performs a match for each element in the input array $input. Listing 6.3 gives an example that simply illustrates the use of the preg_grep() function.
Code 6.3 Array query matching
<?php $subjects = array( "Mechanical Engineering", "Medicine", "Social Science", "Agriculture", "Commercial Science", "Politics" ); //匹配所有仅由有一个单词组成的科目名 $alonewords = preg_grep("/^[a-z]*$/i", $subjects); ?>
6.3.2 Global regular expression matching
1. preg_match_all()
Similar to the preg_match() function. If the third parameter is used, all possible matches will be put in. This function returns the number of times the entire pattern is matched (possibly 0), and returns False if an error occurs. Below is an example of converting a URL link address in text into HTML code. Listing 6.4 is an example of using the preg_match_all() function.
Code 6.4 Convert the link address in the text into HTML
<?php //功能:将文本中的链接地址转成HTML //输入:字符串 //输出:字符串 function url2html($text) { //匹配一个URL,直到出现空白为止 preg_match_all("/http:\/\/?[^\s]+/i", $text, $links); //设置页面显示URL地址的长度 $max_size = 40; foreach($links[0] as $link_url) { //计算URL的长度。如果超过$max_size的设置,则缩短。 $len = strlen($link_url); if($len > $max_size) { $link_text = substr($link_url, 0, $max_size)."..."; } else { $link_text = $link_url; } //生成HTML文字 $text = str_replace($link_url,"<a href='$link_url'>$link_text</a>",$text); } return $text; } //运行实例 $str = “这是一个包含多个URL链接地址的多行文字。欢迎访问http://www.php.cn”; print url2html($str); /*输出结果 这是一个包含多个URL链接地址的多行文字。欢迎访问<a href='http://www.php.cn'> http://www.php.cn</a> */ ?>
2. Multi-line matching
It is difficult to perform complex matching operations only using regular table functions under POSIX. For example, perform matching searches on entire files (especially multi-line text). One way to do this using ereg() is to do it in separate lines. The example in Listing 6.5 demonstrates how ereg() assigns the parameters of the INI file to an array.
Code 6.5 Multi-line matching of file content
<?php $rows = file('php.ini'); //将php.ini文件读到数组中 //循环遍历 foreach($rows as $line) { If(trim($line)) { //将匹配成功的参数写入数组中 if(eregi("^([a-z0-9_.]*) *=(.*)", $line, $matches)) { $options[$matches[1]] = trim($matches[2]); } unset($matches); } } //输出参数结果 print_r($options); ?>
Tips
This is just for the convenience of illustrating the problem. To parse an *.ini file, the best way is to use the function parse_ini_file(). This function directly parses the *.ini file into a large array.
6.3.3 Regular expression replacement
1. ereg_replace() and eregi_replace()
Function prototype: string ereg_replace (string $pattern, string $replacement, string $string)
string eregi_replace (string $pattern, string $replacement, string $string)
ereg_replace ()Search for the pattern string $pattern in $string and replace the matched result with $replacement. When $pattern contains pattern units (or sub-patterns), positions in $replacement such as "\1" or "$1" will be replaced by the content matched by these sub-patterns. And "\0" or "$0" refers to the entire matching string. It should be noted that the backslash is used as an escape character in double quotes, so the form "\\0" or "\\1" must be used.
The functions of eregi_replace() and ereg_replace() are the same, except that the former ignores case. Code 6.6 is an application example of this function. This code demonstrates how to do simple cleaning work on the program source code.
Code 6.6 Source code cleaning
<?php $lines = file('source.php'); //将文件读入数组中 for($i=0; $i<count($lines); $i++) { //将行末以“\\”或“#”开头的注释去掉 $lines[$i] = eregi_replace("(\/\/|#).*$", "", $lines[$i]); //将行末的空白消除 $lines[$i] = eregi_replace("[ \n\r\t\v\f]*$", "\r\n", $lines[$i]); } //整理后输出到页面 echo htmlspecialchars(join("",$lines)); ?>
2.preg_replace()
函数原型:mixed preg_replace (mixed $pattern, mixed $replacement, mixed $subject [, int $limit])
preg_replace较ereg_replace的功能更加强大。其前三个参数均可以使用数组;第四个参数$limit可以设置替换的次数,默认为全部替换。代码6.7是一个数组替换的应用实例。
代码6.7 数组替换
<?php //字符串 $string = "Name: {Name}<br>\nEmail: {Email}<br>\nAddress: {Address}<br>\n"; //模式 $patterns =array( "/{Address}/", "/{Name}/", "/{Email}/" ); //替换字串 $replacements = array ( "No.5, Wilson St., New York, U.S.A", "Thomas Ching", "tom@emailaddress.com", ); //输出模式替换结果 print preg_replace($patterns, $replacements, $string); ?>
输出结果如下。
Name: Thomas Ching",
Email: tom@emailaddress.com
Address: No.5, Wilson St., New York, U.S.A
在preg_replace的正则表达式中可以使用模式修正符“e”。其作用是将匹配结果用作表达式,并且可以进行重新运算。例如:
<?php $html_body = “<HTML><Body><H1>TEST</H1>My Picture<Img src=”my.gif”></Body></HTML>”; //输出结果中HTML标签将全部为小写字母 echo preg_replace ( "/(<\/?)(\w+)([^>]*>)/e", "'\\1'.strtolower('\\2').'\\3'", //此处的模式变量\\2将被strtolower转换为小写字符 $html_body); ?>
提示
preg_replace函数使用了Perl兼容正则表达式语法,通常是比ereg_replace更快的替代方案。如果仅对字符串做简单的替换,可以使用str_replace函数。
6.3.4 正则表达式的拆分
1.split()和spliti()
函数原型:array split (string $pattern, string $string [, int $limit])
本函数返回一个字符串数组,每个单元为$string经正则表达式$pattern作为边界分割出的子串。如 果设定了$limit,则返回的数组最多包含$limit个单元。而其中最后一个单元包含了$string中剩余的所有部分。spliti是split的 忽略大小版本。代码6.8是一个经常用到关于日期的示例。
代码6.8 日期的拆分
<?php $date = "08/30/2006"; //分隔符可以是斜线,点,或横线 list($month, $day, $year) = split ('[/.-]', $date); //输出为另一种时间格式 echo "Month: $month; Day: $day; Year: $year<br />\n"; ?>
2.preg_split()
本函数与split函数功能一致。代码6.9是一个查找文章中单词数量的示例。
代码6.9 查找文章中单词数量
<?php $seek = array(); $text = "I have a dream that one day I can make it. So just do it, nothing is impossible!"; //将字符串按空白,标点符号拆分(每个标点后也可能跟有空格) $words = preg_split("/[.,;!\s']\s*/", $text); foreach($words as $val) { $seek[strtolower($val)] ++; } echo "共有大约" .count($words). "个单词。"; echo "其中共有" .$seek['i']. "个单词“I”。"; ?>
提示
preg_split() 函数使用了Perl兼容正则表达式语法,通常是比split()更快的替代方案。使用正则表达式的方法分割字符串,可以使用更广泛的分隔字符。例如,上面 对日期格式和单词处理的分析。如果仅用某个特定的字符进行分割,建议使用explode()函数,它不调用正则表达式引擎,因此速度是最快的。
更多PHP 正则表达式常用函数使用小结相关文章请关注PHP中文网!
相关文章: