Table of Contents
使用file_get_content系列函数和使用curl系列函数采集图片的性能对比,curl函数
Home php教程 php手册 使用file_get_content系列函数和使用curl系列函数采集图片的性能对比,curl函数

使用file_get_content系列函数和使用curl系列函数采集图片的性能对比,curl函数

Jun 13, 2016 am 09:10 AM
c content curl file get use function and picture Compared performance of series collection

使用file_get_content系列函数和使用curl系列函数采集图片的性能对比,curl函数

  由于公司的一个汽车网站的后台的汽车内容都是主要是来自与汽车之家的,编辑的同事们必须天天手动去对着汽车之家来添加汽车,实在是太蛋疼了。于是乎,为了改变这种状况,作为一个开发码农,我的任务就来了。。。那就是准备做一个功能,只要粘贴对应的汽车之家的网址url就能对这些数据进行自动填充到我们后台的表单中,目前基本的填充都实现了,但是还是没有能够把对应的汽车相册采集进来。

  采集图片的功能我以前也做过,但是汽车之家大部分的汽车都有挺多图片的,开始的时候,我打算使用以前的采集图片的办法,也就是使用file_get_content获取url对应的内容,然后匹配到图片的地址,再使用file_get_content获取这些图片url的内容,再载入到本地去,代码如下:

<?<span>php
</span><span>header</span>('Content-type:text/html;charset=utf-8'<span>);
</span><span>set_time_limit</span>(0<span>);

</span><span>class</span><span> runtime  
{  
    </span><span>var</span> <span>$StartTime</span> = 0<span>;  
    </span><span>var</span> <span>$StopTime</span> = 0<span>;  
   
    </span><span>function</span><span> get_microtime()  
    {  
        </span><span>list</span>(<span>$usec</span>, <span>$sec</span>) = <span>explode</span>(' ', <span>microtime</span><span>());  
        </span><span>return</span> ((<span>float</span>)<span>$usec</span> + (<span>float</span>)<span>$sec</span><span>);  
    }  
   
    </span><span>function</span><span> start()  
    {  
        </span><span>$this</span>->StartTime = <span>$this</span>-><span>get_microtime();  
    }  
   
    </span><span>function</span><span> stop()  
    {  
        </span><span>$this</span>->StopTime = <span>$this</span>-><span>get_microtime();  
    }  
   
    </span><span>function</span><span> spent()  
    {  
        </span><span>return</span> <span>round</span>((<span>$this</span>->StopTime - <span>$this</span>->StartTime) * 1000, 1<span>);  
    }  
   
}  

</span><span>$runtime</span>= <span>new</span><span> runtime();  
</span><span>$runtime</span>-><span>start();  

</span><span>$url</span> = 'http://car.autohome.com.cn/pic/series-s15306/289.html#pvareaid=102177'<span>;
</span><span>$rs</span> = <span>file_get_contents</span>(<span>$url</span><span>);
</span><span>//</span><span> echo $rs;exit;</span>
<span>preg_match_all</span>('/(\/pic\/series-s15306\/289-\d+\.html)/', <span>$rs</span>, <span>$urlArr</span><span>);

</span><span>$avalie</span> = <span>array_unique</span>(<span>$urlArr</span>[0<span>]);
</span><span>$count</span> = <span>array</span><span>();
</span><span>foreach</span> (<span>$avalie</span> <span>as</span> <span>$key</span> => <span>$ul</span><span>) {
   </span><span>$pattern</span> = '/<img  src="/static/imghw/default1.png"  data-src="(http:\/\/car1\.autoimg\.cn\/upload\/\d+\/\d+\/\d+\/.*?\.jpg)"  class="lazy"/'<span alt="使用file_get_content系列函数和使用curl系列函数采集图片的性能对比,curl函数" >;
   </span><span>preg_match_all</span>(<span>$pattern</span>, <span>file_get_contents</span>('http://car.autohome.com.cn'.<span>$ul</span>), <span>$imgSrc</span><span>);
   </span><span>$count</span> = <span>array_merge</span>(<span>$count</span>, <span>$imgSrc</span>[1<span>]);
}


</span><span>foreach</span>(<span>$count</span> <span>as</span> <span>$k</span>=><span>$v</span><span>) {
  </span><span>$data</span>[<span>$k</span>] = <span>file_get_contents</span>(<span>$v</span><span>);
}

</span><span>foreach</span>(<span>$data</span> <span>as</span> <span>$k</span>=><span>$v</span><span>) {
  </span><span>file_put_contents</span>('./pic2/'.<span>time</span>().'_'.<span>rand</span>(1, 10000).'.jpg', <span>$v</span><span>);
}

</span><span>$runtime</span>-><span>stop();  
</span><span>echo</span> "页面执行时间: ".<span>$runtime</span>->spent()." 毫秒"; 
Copy after login

  结果发现,这种方法少图片还好,图片多了,那是相当的卡。。就本地测试也比较难跑,更不如说到时候上线了。百度之后,我采用了curl的办法来下载图片,经过测试后的确有所改善,但是感觉还是有点慢,要是php有多线程那有多好。。。

  又经过一番折腾和找资料,发现php的curl库其实还是可以模拟多线程的,那就是使用curl_multi_*系列的函数,经过改写,代码又变成了这样:

  

<?<span>php
</span><span>header</span>('Content-type:text/html;charset=utf-8'<span>);
</span><span>set_time_limit</span>(0<span>);

</span><span>class</span><span> runtime  
{  
    </span><span>var</span> <span>$StartTime</span> = 0<span>;  
    </span><span>var</span> <span>$StopTime</span> = 0<span>;  
   
    </span><span>function</span><span> get_microtime()  
    {  
        </span><span>list</span>(<span>$usec</span>, <span>$sec</span>) = <span>explode</span>(' ', <span>microtime</span><span>());  
        </span><span>return</span> ((<span>float</span>)<span>$usec</span> + (<span>float</span>)<span>$sec</span><span>);  
    }  
   
    </span><span>function</span><span> start()  
    {  
        </span><span>$this</span>->StartTime = <span>$this</span>-><span>get_microtime();  
    }  
   
    </span><span>function</span><span> stop()  
    {  
        </span><span>$this</span>->StopTime = <span>$this</span>-><span>get_microtime();  
    }  
   
    </span><span>function</span><span> spent()  
    {  
        </span><span>return</span> <span>round</span>((<span>$this</span>->StopTime - <span>$this</span>->StartTime) * 1000, 1<span>);  
    }  
   
}  

</span><span>$runtime</span>= <span>new</span><span> runtime();  
</span><span>$runtime</span>-><span>start();  


</span><span>$url</span> = 'http://car.autohome.com.cn/pic/series-s15306/289.html#pvareaid=102177'<span>;
</span><span>$rs</span> = <span>file_get_contents</span>(<span>$url</span><span>);
</span><span>preg_match_all</span>('/(\/pic\/series-s15306\/289-\d+\.html)/', <span>$rs</span>, <span>$urlArr</span><span>);

</span><span>$avalie</span> = <span>array_unique</span>(<span>$urlArr</span>[0<span>]);
</span><span>$count</span> = <span>array</span><span>();
</span><span>foreach</span> (<span>$avalie</span> <span>as</span> <span>$key</span> => <span>$ul</span><span>) {
   </span><span>$pattern</span> = '/<img  src="/static/imghw/default1.png"  data-src="(http:\/\/car1\.autoimg\.cn\/upload\/\d+\/\d+\/\d+\/.*?\.jpg)"  class="lazy"/'<span alt="使用file_get_content系列函数和使用curl系列函数采集图片的性能对比,curl函数" >;
   </span><span>preg_match_all</span>(<span>$pattern</span>, <span>file_get_contents</span>('http://car.autohome.com.cn'.<span>$ul</span>), <span>$imgSrc</span><span>);
   </span><span>$count</span> = <span>array_merge</span>(<span>$count</span>, <span>$imgSrc</span>[1<span>]);
}

</span><span>$handle</span> =<span> curl_multi_init();

</span><span>foreach</span>(<span>$count</span> <span>as</span> <span>$k</span> => <span>$v</span><span>) {
  </span><span>$curl</span>[<span>$k</span>] = curl_init(<span>$v</span><span>);
  curl_setopt(</span><span>$curl</span>[<span>$k</span>], CURLOPT_RETURNTRANSFER, 1<span>);
  curl_setopt(</span><span>$curl</span>[<span>$k</span>], CURLOPT_HEADER, 0<span>);
  curl_setopt(</span><span>$curl</span>[<span>$k</span>], CURLOPT_TIMEOUT, 30<span>);
  curl_multi_add_handle (</span><span>$handle</span>, <span>$curl</span>[<span>$k</span><span>]);
}

</span><span>$active</span> = <span>null</span><span>;

</span><span>do</span><span> {
    </span><span>$mrc</span> = curl_multi_exec(<span>$handle</span>, <span>$active</span><span>);
} </span><span>while</span> (<span>$mrc</span> ==<span> CURLM_CALL_MULTI_PERFORM);

</span><span>while</span> (<span>$active</span> && <span>$mrc</span> ==<span> CURLM_OK) {
    // 这句在php5.3以后的版本很关键,因为没有这句,可能curl_multi_select可能会永远返回-1,这样就永远死在循环里了
    </span><span>while</span> (curl_multi_exec(<span>$handle</span>, <span>$active</span>) ===<span> CURLM_CALL_MULTI_PERFORM);

    </span><span>if</span> (curl_multi_select(<span>$handle</span>) != -1<span>) {
        </span><span>do</span><span> {
            </span><span>$mrc</span> = curl_multi_exec(<span>$handle</span>, <span>$active</span><span>);
        } </span><span>while</span> (<span>$mrc</span> ==<span> CURLM_CALL_MULTI_PERFORM);
    }
}

</span><span>foreach</span> (<span>$curl</span> <span>as</span> <span>$k</span> => <span>$v</span><span>) {
    </span><span>if</span> (curl_error(<span>$curl</span>[<span>$k</span>]) == ""<span>) {
        </span><span>$data</span>[<span>$k</span>] = curl_multi_getcontent(<span>$curl</span>[<span>$k</span><span>]);
    }
    curl_multi_remove_handle(</span><span>$handle</span>, <span>$curl</span>[<span>$k</span><span>]);
    curl_close(</span><span>$curl</span>[<span>$k</span><span>]);
}

</span><span>foreach</span>(<span>$data</span> <span>as</span> <span>$k</span>=><span>$v</span><span>) {
    </span><span>$file</span> = <span>time</span>().'_'.<span>rand</span>(1000, 9999).'.jpg'<span>;
    </span><span>file_put_contents</span>('./pic3/'.<span>$file</span>, <span>$v</span><span>); 
}

curl_multi_close(</span><span>$handle</span><span>);

</span><span>$runtime</span>-><span>stop();  
</span><span>echo</span> "页面执行时间: ".<span>$runtime</span>->spent()." 毫秒"; 
Copy after login

  好了,多线程的采集真是非常酸爽,然后通过一系列的测试和对比,5次测试,curl多线程有4次是快于file_get_content的,而且时间还是file_get_content的3~5倍,总结起来,以后采集都尽量使用这种办法,提高效率不在话下。

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Xiaomi 15 series full codenames revealed: Dada, Haotian, Xuanyuan Xiaomi 15 series full codenames revealed: Dada, Haotian, Xuanyuan Aug 22, 2024 pm 06:47 PM

The Xiaomi Mi 15 series is expected to be officially released in October, and its full series codenames have been exposed in the foreign media MiCode code base. Among them, the flagship Xiaomi Mi 15 Ultra is codenamed "Xuanyuan" (meaning "Xuanyuan"). This name comes from the Yellow Emperor in Chinese mythology, which symbolizes nobility. Xiaomi 15 is codenamed "Dada", while Xiaomi 15Pro is named "Haotian" (meaning "Haotian"). The internal code name of Xiaomi Mi 15S Pro is "dijun", which alludes to Emperor Jun, the creator god of "The Classic of Mountains and Seas". Xiaomi 15Ultra series covers

Performance comparison of different Java frameworks Performance comparison of different Java frameworks Jun 05, 2024 pm 07:14 PM

Performance comparison of different Java frameworks: REST API request processing: Vert.x is the best, with a request rate of 2 times SpringBoot and 3 times Dropwizard. Database query: SpringBoot's HibernateORM is better than Vert.x and Dropwizard's ORM. Caching operations: Vert.x's Hazelcast client is superior to SpringBoot and Dropwizard's caching mechanisms. Suitable framework: Choose according to application requirements. Vert.x is suitable for high-performance web services, SpringBoot is suitable for data-intensive applications, and Dropwizard is suitable for microservice architecture.

The best time to buy Huawei Mate 60 series, new AI elimination + image upgrade, and enjoy autumn promotions The best time to buy Huawei Mate 60 series, new AI elimination + image upgrade, and enjoy autumn promotions Aug 29, 2024 pm 03:33 PM

Since the Huawei Mate60 series went on sale last year, I personally have been using the Mate60Pro as my main phone. In nearly a year, Huawei Mate60Pro has undergone multiple OTA upgrades, and the overall experience has been significantly improved, giving people a feeling of being constantly new. For example, recently, the Huawei Mate60 series has once again received a major upgrade in imaging capabilities. The first is the new AI elimination function, which can intelligently eliminate passers-by and debris and automatically fill in the blank areas; secondly, the color accuracy and telephoto clarity of the main camera have been significantly upgraded. Considering that it is the back-to-school season, Huawei Mate60 series has also launched an autumn promotion: you can enjoy a discount of up to 800 yuan when purchasing the phone, and the starting price is as low as 4,999 yuan. Commonly used and often new products with great value

Complete collection of excel function formulas Complete collection of excel function formulas May 07, 2024 pm 12:04 PM

1. The SUM function is used to sum the numbers in a column or a group of cells, for example: =SUM(A1:J10). 2. The AVERAGE function is used to calculate the average of the numbers in a column or a group of cells, for example: =AVERAGE(A1:A10). 3. COUNT function, used to count the number of numbers or text in a column or a group of cells, for example: =COUNT(A1:A10) 4. IF function, used to make logical judgments based on specified conditions and return the corresponding result.

How to optimize the performance of multi-threaded programs in C++? How to optimize the performance of multi-threaded programs in C++? Jun 05, 2024 pm 02:04 PM

Effective techniques for optimizing C++ multi-threaded performance include limiting the number of threads to avoid resource contention. Use lightweight mutex locks to reduce contention. Optimize the scope of the lock and minimize the waiting time. Use lock-free data structures to improve concurrency. Avoid busy waiting and notify threads of resource availability through events.

What is Bitget Launchpool? How to use Bitget Launchpool? What is Bitget Launchpool? How to use Bitget Launchpool? Jun 07, 2024 pm 12:06 PM

BitgetLaunchpool is a dynamic platform designed for all cryptocurrency enthusiasts. BitgetLaunchpool stands out with its unique offering. Here, you can stake your tokens to unlock more rewards, including airdrops, high returns, and a generous prize pool exclusive to early participants. What is BitgetLaunchpool? BitgetLaunchpool is a cryptocurrency platform where tokens can be staked and earned with user-friendly terms and conditions. By investing BGB or other tokens in Launchpool, users have the opportunity to receive free airdrops, earnings and participate in generous bonus pools. The income from pledged assets is calculated within T+1 hours, and the rewards are based on

Things to note when Golang functions receive map parameters Things to note when Golang functions receive map parameters Jun 04, 2024 am 10:31 AM

When passing a map to a function in Go, a copy will be created by default, and modifications to the copy will not affect the original map. If you need to modify the original map, you can pass it through a pointer. Empty maps need to be handled with care, because they are technically nil pointers, and passing an empty map to a function that expects a non-empty map will cause an error.

How good is the performance of random number generators in Golang? How good is the performance of random number generators in Golang? Jun 01, 2024 pm 09:15 PM

The best way to generate random numbers in Go depends on the level of security required by your application. Low security: Use the math/rand package to generate pseudo-random numbers, suitable for most applications. High security: Use the crypto/rand package to generate cryptographically secure random bytes, suitable for applications that require stronger randomness.

See all articles