Home Backend Development PHP Tutorial PHP多进程编程(三)多进程抓取网页的演示

PHP多进程编程(三)多进程抓取网页的演示

Jun 20, 2016 pm 12:32 PM

要理解这个部分的代码,请阅读:

PHP多进程编程(一)

PHP多进程编程(二)管道通信

我们知道,从父进程到子经常的数据传递相对比较容易一些,但是从子进程传递到父进程就比较的困难。

有很多办法实现进程交互,在php中比较方便的是 管道通信。当然,还可以通过 socket_pair 进行通信。

首先是服务器为了应对每一个请求要做的事情(发送一个url 序列,url序列用t 分割。而结束标记是 n)

function clientHandle($msgsock, $obj){    $nbuf = '';    socket_set_block($msgsock);    do {        if (false === ($buf = @socket_read($msgsock, 2048, PHP_NORMAL_READ))) {            $obj->error("socket_read() failed: reason: " . socket_strerror(socket_last_error($msgsock)));            break;        }        $nbuf .= $buf;        if (substr($nbuf, -1) != "\n") {            continue;        }        $nbuf = trim($nbuf);        if ($nbuf == 'quit') {            break;        }        if ($nbuf == 'shutdown') {            break;        }        $url = explode("\t", $nbuf);        $nbuf = '';        $talkback = serialize(read_ntitle($url));        socket_write($msgsock, $talkback, strlen($talkback));        debug("write to the client\n");        break;    } while (true);}
Copy after login

上面代码比较关键的一个部分是 read_ntitle,这个函数实现多线程的读取标题。

代码如下:(为每一个url fork 一个线程,然后打开管道 ,读取到的标题写入到管道里面去,主线程一直的在读取管道数据,直到所有的数据读取完毕,最后删除管道)

function read_ntitle($arr){    $pipe = new Pipe("multi-read");    foreach ($arr as $k => $item)    {        $pids[$k] = pcntl_fork();        if(!$pids[$k])        {             $pipe->open_write();             $pid = posix_getpid();             $content = base64_encode(read_title($item));             $pipe->write("$k,$content\n");             $pipe->close_write();             debug("$k: write success!\n");             exit;        }    }    debug("read begin!\n");    $data = $pipe->read_all();    debug("read end!\n");$pipe->rm_pipe();return parse_data($data);}parse_data 代码如下,非常的简单,就不说了。parse_data  代码如下,非常的简单,就不说了。function parse_data($data){    $data = explode("\n", $data);    $new = array();    foreach ($data as $value)    {        $value = explode(",", $value);        if (count($value) == 2) {            $value[1] = base64_decode($value[1]);            $new[intval($value[0])] = $value[1];        }    }    ksort($new, SORT_NUMERIC);    return $new;}
Copy after login

上面代码中,还有一个函数read_title 比较有技巧。为了兼容性,我没有采用curl,而是直接采用socket 通信。

在下载到 title 标签后,就停止读取内容,以节省时间。代码如下:

function read_title($url){    $url_info = parse_url($url);    if (!isset($url_info['host']) || !isset($url_info['scheme'])) {     return false;    }    $host = $url_info['host'];     $port = isset($url_info['port']) ? $url_info['port'] : null; $path = isset($url_info['path']) ? $url_info['path']  : "/"; if(isset($url_info['query'])) $path .= "?".$url_info['query']; if(empty($port)){  $port = 80; } if ($url_info['scheme'] == 'https'){  $port = 443; } if ($url_info['scheme'] == 'http') {  $port = 80; }    $out = "GET $path HTTP/1.1\r\n";    $out .= "Host: $host\r\n";    $out .= "User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN; rv:1.9.1.7)\r\n";    $out .= "Connection: Close\r\n\r\n";    $fp = fsockopen($host, $port, $errno, $errstr, 5);    if ($fp == NULL) {     error("get title from $url, error. $errno: $errstr \n");     return false;    }    fwrite($fp, $out);    $content = '';    while (!feof($fp)) {        $content .= fgets($fp, 1024);        if (preg_match("/<title>(.*?)<\/title>/is", $content, $matches)) {             fclose($fp);            return encode_to_utf8($matches[1]);        }    }    fclose($fp);    return false;}function encode_to_utf8($string) {     return mb_convert_encoding($string, "UTF-8", mb_detect_encoding($string, "UTF-8, GB2312, ISO-8859-1", true));}
Copy after login

这里,我只是检测了 三种最常见的编码。其他的代码都很简单,这些代码都是测试用的,如果你要做这样一个服务器,一定要进行优化处理。特别是,要防止一次打开太多的进程,你要做更多的处理。

很多时候,我们抱怨php 不支持多进程,实际上,php是支持多进程的。当然,没有那么多的进程通信的选项,而多进程的核心就在于进程的通信与同步。在web开发中,这样的多线程基本上是不会使用的,因为有很严重的性能问题。要实现比较简单的多进程,高负载,必须借助其扩展。

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
2 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
Repo: How To Revive Teammates
1 months ago By 尊渡假赌尊渡假赌尊渡假赌
Hello Kitty Island Adventure: How To Get Giant Seeds
4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Working with Flash Session Data in Laravel Working with Flash Session Data in Laravel Mar 12, 2025 pm 05:08 PM

Laravel simplifies handling temporary session data using its intuitive flash methods. This is perfect for displaying brief messages, alerts, or notifications within your application. Data persists only for the subsequent request by default: $request-

Build a React App With a Laravel Back End: Part 2, React Build a React App With a Laravel Back End: Part 2, React Mar 04, 2025 am 09:33 AM

This is the second and final part of the series on building a React application with a Laravel back-end. In the first part of the series, we created a RESTful API using Laravel for a basic product-listing application. In this tutorial, we will be dev

Simplified HTTP Response Mocking in Laravel Tests Simplified HTTP Response Mocking in Laravel Tests Mar 12, 2025 pm 05:09 PM

Laravel provides concise HTTP response simulation syntax, simplifying HTTP interaction testing. This approach significantly reduces code redundancy while making your test simulation more intuitive. The basic implementation provides a variety of response type shortcuts: use Illuminate\Support\Facades\Http; Http::fake([ 'google.com' => 'Hello World', 'github.com' => ['foo' => 'bar'], 'forge.laravel.com' =>

cURL in PHP: How to Use the PHP cURL Extension in REST APIs cURL in PHP: How to Use the PHP cURL Extension in REST APIs Mar 14, 2025 am 11:42 AM

The PHP Client URL (cURL) extension is a powerful tool for developers, enabling seamless interaction with remote servers and REST APIs. By leveraging libcurl, a well-respected multi-protocol file transfer library, PHP cURL facilitates efficient execution of various network protocols, including HTTP, HTTPS, and FTP. This extension offers granular control over HTTP requests, supports multiple concurrent operations, and provides built-in security features.

12 Best PHP Chat Scripts on CodeCanyon 12 Best PHP Chat Scripts on CodeCanyon Mar 13, 2025 pm 12:08 PM

Do you want to provide real-time, instant solutions to your customers' most pressing problems? Live chat lets you have real-time conversations with customers and resolve their problems instantly. It allows you to provide faster service to your custom

Notifications in Laravel Notifications in Laravel Mar 04, 2025 am 09:22 AM

In this article, we're going to explore the notification system in the Laravel web framework. The notification system in Laravel allows you to send notifications to users over different channels. Today, we'll discuss how you can send notifications ov

Explain the concept of late static binding in PHP. Explain the concept of late static binding in PHP. Mar 21, 2025 pm 01:33 PM

Article discusses late static binding (LSB) in PHP, introduced in PHP 5.3, allowing runtime resolution of static method calls for more flexible inheritance.Main issue: LSB vs. traditional polymorphism; LSB's practical applications and potential perfo

PHP Logging: Best Practices for PHP Log Analysis PHP Logging: Best Practices for PHP Log Analysis Mar 10, 2025 pm 02:32 PM

PHP logging is essential for monitoring and debugging web applications, as well as capturing critical events, errors, and runtime behavior. It provides valuable insights into system performance, helps identify issues, and supports faster troubleshoot

See all articles