Heim > Backend-Entwicklung > PHP-Tutorial > 流方式实现多线程采集有关问题,请高手分析上

流方式实现多线程采集有关问题,请高手分析上

WBOY
Freigeben: 2016-06-13 13:09:17
Original
860 Leute haben es durchsucht

流方式实现多线程采集问题,请高手分析下
采集内容速度慢,我一直很头大,最近在研究多线程采集,下面贴出比较代码,有两个问题,一是获取的结果长度有点不一致;二是效率是不是还不够高?大伙帮忙分析,测试!

PHP code
<!--

Code highlighting produced by Actipro CodeHighlighter (freeware)
http://www.CodeHighlighter.com/

-->
<?php $timeStart = microtimeFloat();
function microtimeFloat() {
    list($usec, $sec) = explode(" ", microtime());
    return ((float)$usec + (float)$sec);
}
$data = '';
$urls = array('http://www.tzksgs.com/news/2012-09/article-217.html', 'http://www.tzksgs.com/news/2012-09/article-219.html', 'http://www.tzksgs.com/news/2012-09/article-222.html');
foreach($urls as $url){
    echo strlen(file_get_contents($url)),'<br>';
}
$timeEnd = microtimeFloat();
echo sprintf("Spend time: %s second(s)\n", $timeEnd - $timeStart),'<br>';
$timeStart = microtimeFloat();
$timeout = 30;
$status = array();
$retdata = array();
$sockets = array();
$userAgent = $_SERVER['HTTP_USER_AGENT'];
foreach($urls as $id => $url) {
    $tmp = parse_url($url);
    $host = $tmp['host'];
    $path = isset($tmp['path'])?$tmp['path']:'/';
    empty($tmp['query']) or $path .= '?' . $tmp['query'];
    if (empty($tmp['port'])) {
        $port = $tmp['scheme'] == 'https' ? 443 : 80;
    } else $port = $tmp['port'];
    $fp = stream_socket_client("$host:$port", $errno, $errstr, 30);
    if (!$fp) {
        $status[$id] = "failed, $errno $errstr";
    } else {
        $status[$id] = "in progress";
        $retdata[$id] = '';
        $sockets[$id] = $fp;
        fwrite($fp, "GET $path HTTP/1.1\r\nHost: $host\r\nUser-Agent: $userAgent\r\nConnection: Close\r\n\r\n");
    }
}
// Now, wait for the results to come back in

while (count($sockets)) {
    $read = $write = $sockets;
    //This is the magic function - explained below
    if (stream_select($read, $write = null, $e = null, $timeout)) {
        //readable sockets either have data for us, or are failed connection attempts
        foreach ($read as $r) {
            $id = array_search($r, $sockets);
            $data = fread($r, 8192);
            if (strlen($data) == 0) {
                if ($status[$id] == "in progress") {
                    $status[$id] = "failed to connect";
                }
                fclose($r);
                unset($sockets[$id]);
            } else {
                $retdata[$id] .= $data;
            }
        }
    }
}
foreach($retdata as $data){
    $data = trim(substr($data, strpos($data, "\r\n\r\n") + 4));
    echo strlen($data),'<br>';
}
$timeEnd = microtimeFloat();
echo sprintf("Spend time: %s second(s)\n", $timeEnd - $timeStart);
?>

Nach dem Login kopieren


------解决方案--------------------
你可以尝试 curl_multi_.... 并发执行
这样可尽可能的减少 php 指令,至于楼上两位说的问题。绝不是php所能解决的

------解决方案--------------------
当然,file_get_contents()是阻塞型的,所以如果是执行多个抓取任务,当然会慢。
而socket_*(), fsockopen(), stream_*()都是非阻塞的。
------解决方案--------------------
慢到什么程度? 

试下加上这个:

$context = stream_context_create(array('http' => array('header'=>'Connection: close')));
file_get_contents(".....",false,$context);
Verwandte Etiketten:
Quelle:php.cn
Erklärung dieser Website
Der Inhalt dieses Artikels wird freiwillig von Internetnutzern beigesteuert und das Urheberrecht liegt beim ursprünglichen Autor. Diese Website übernimmt keine entsprechende rechtliche Verantwortung. Wenn Sie Inhalte finden, bei denen der Verdacht eines Plagiats oder einer Rechtsverletzung besteht, wenden Sie sich bitte an admin@php.cn
Beliebte Tutorials
Mehr>
Neueste Downloads
Mehr>
Web-Effekte
Quellcode der Website
Website-Materialien
Frontend-Vorlage