我想匹配出 http://www.so.com/s?q=csdn&pn=7&j=0里每个搜索结果的url,用下面的正则匹配出的结果为空,错在哪里呢?
1 | $c1 = "/<h3 class=\"res-title (?:mark\-nowrap)?\">\s*<a target=\"_blank\" data-m=\"(?:.*)\" data-pos\"(?:\d+)\" data-e=\"(?:\d+)\" data-st=\"(?:\d+)\" href=\"(.*)\">(?:.*)<\/a>\s*<\/h3>/Uis" ; $content = get_content( 'http://www.so.com/s?q=csdn&pn=7&j=0' ); preg_match_all( $c1 , $content , $arr1 ); print_r( $arr1 );
|
登入後複製
回复讨论(解决方案)
你应该重新考虑一下问问题的方法,这样问没几个人愿意回答的
全文都找不到一个
用惯了DOM解析,正则生疏了,那个href太长了,所有用了2次preg_match_all,勉强匹配出来了。
1 | <a target= "_blank" ,更别说匹配了= "" <= "" p= "" > $urls = 'http://www.so.com/s?q=csdn&pn=7&j=0' ; $ch = curl_init();curl_setopt( $ch , CURLOPT_URL, $urls );curl_setopt( $ch , CURLOPT_RETURNTRANSFER, 1); $content = curl_exec( $ch );curl_close( $ch ); $c1 = "/<li class=\"res-list\">(.*?)<a href=\"(.*?)\">(.*?)<\/a>(.*?)<\/li>/is" ;preg_match_all( $c1 , $content , $arr1 ); foreach ( $arr1 [0] as $part ){ $c2 = "/href=('|\")(.*?)(?1)\s+/is" ; preg_match_all( $c2 , $part , $arr2 ); echo $arr2 [2][0].'<br />';}</a>
|