Note: This program may be very suitable for those friends who are doing Baidu Tieba marketing.
When I visit Baidu Tieba, I often see the poster sharing some resources and asking for an email address before the poster sends them.
For a popular post, there are a lot of mailboxes left. The author needs to copy the reply mailboxes one by one, and then paste and send the mail. Either he will be tortured to death or exhausted. I was so bored that I wrote a program to capture Baidu Tieba mailbox data and take away what was needed.
The program implements two functions: one-click capture of all mailboxes of posts and paged capture of mailboxes. The interface is too lazy to do, and the effect is as follows:
Old rule, post the source code directly
<?<span>php </span><span>$url2</span>=""<span>; </span><span>$page</span>=""<span>; </span><span>if</span>(<span>$_GET</span>['url2']==""<span>){ </span><span>$url2</span>="http://tieba.baidu.com/p/2314539885?pn=1"<span>; }</span><span>else</span><span>{ </span><span>$url2</span>=<span>$_GET</span>['url2'<span>]; } </span><span>if</span>(<span>$_GET</span>['page']==""<span>){ </span><span>$page</span>="1"<span>; }</span><span>else</span><span>{ </span><span>$page</span>=<span>$_GET</span>['page'<span>]; } </span>?> <form action="" method="get"> <input type="hidden" value="getAll" name="type" /> <table> <tr> <td>帖子链接:</td><td><input type="text" name="url" value="http://tieba.baidu.com/p/2314539885">$page</span>;?>" /></td> </tr> <tr> <td colspan=2><input type="submit" value="抓取全部邮箱数据" /></td> </tr> </table> </form> <form action="" method="get"> <input type="hidden" value="getNow" name="type" /> <table> <tr> <td>帖子链接:</td><td><input type="text" name="url2" value="<?php echo <span>$url2</span>;?>">php </span><span>if</span>(<span>$_GET</span>['type']!=""<span>){ </span><span>$counts</span>=0<span>; </span><span>if</span>(<span>$_GET</span>['type']=="getAll"<span>){ </span><span>$pages</span>=<span>$_GET</span>['page'<span>]; </span><span>$url</span> = <span>$_GET</span>['url'<span>]; </span><span>for</span>(<span>$i</span>=0;<span>$i</span><<span>$pages</span>;<span>$i</span>++<span>){ </span><span>$ch2</span> =<span> curl_init(); curl_setopt(</span><span>$ch2</span>, CURLOPT_URL, <span>$url</span><span>); curl_setopt(</span><span>$ch2</span>, CURLOPT_FOLLOWLOCATION, <span>TRUE</span><span>); curl_setopt(</span><span>$ch2</span>, CURLOPT_SSL_VERIFYHOST, <span>FALSE</span><span>); curl_setopt(</span><span>$ch2</span>, CURLOPT_SSL_VERIFYPEER, <span>false</span><span>); curl_setopt(</span><span>$ch2</span>, CURLOPT_RETURNTRANSFER, <span>TRUE</span><span>); </span><span>$texts</span> = curl_exec(<span>$ch2</span><span>); curl_close(</span><span>$ch2</span><span>); </span><span>$dat</span>=getEmail(<span>$texts</span><span>); </span><span>for</span>(<span>$j</span>=0;<span>$j</span><<span>count</span>(<span>$dat</span>);<span>$j</span>++<span>){ </span><span>echo</span> <span>$dat</span>[<span>$j</span>]."<br />"<span>; </span><span>$counts</span>++<span>; } } }</span><span>else</span> <span>if</span>(<span>$_GET</span>['type']=="getNow"<span>){ </span><span>$url</span> = <span>$_GET</span>['url2'<span>]; </span><span>$ch2</span> =<span> curl_init(); curl_setopt(</span><span>$ch2</span>, CURLOPT_URL, <span>$url</span><span>); curl_setopt(</span><span>$ch2</span>, CURLOPT_FOLLOWLOCATION, <span>TRUE</span><span>); curl_setopt(</span><span>$ch2</span>, CURLOPT_SSL_VERIFYHOST, <span>FALSE</span><span>); curl_setopt(</span><span>$ch2</span>, CURLOPT_SSL_VERIFYPEER, <span>false</span><span>); curl_setopt(</span><span>$ch2</span>, CURLOPT_RETURNTRANSFER, <span>TRUE</span><span>); </span><span>$texts</span> = curl_exec(<span>$ch2</span><span>); curl_close(</span><span>$ch2</span><span>); </span><span>$dat</span>=getEmail(<span>$texts</span><span>); </span><span>for</span>(<span>$i</span>=0;<span>$i</span><<span>count</span>(<span>$dat</span>);<span>$i</span>++<span>){ </span><span>echo</span> <span>$dat</span>[<span>$i</span>]."<br />"<span>; </span><span>$counts</span>++<span>; } } </span><span>echo</span> '<h2>共采集到数据:'.<span>$counts</span>.'条</h2>'<span>; } </span><span>function</span> getEmail(<span>$str</span><span>){ </span><span>$pattern</span> = "/([a-z0-9\-_\.]+@[a-z0-9]+\.[a-z0-9\-_\.]+)/"<span>; </span><span>preg_match_all</span>(<span>$pattern</span>,<span>$str</span>,<span>$emailArr</span><span>); </span><span>return</span> <span>$emailArr</span>[0<span>]; } </span>?>