php抓取网站图片的程序
此程序实现了网页源代码捕获,图片链接获取、分析、并将同样的图片链接合并功能,实现了图片抓取功能。
利用php强大的网络内容处理函数将指定的网站上的所有图片抓取下来,保存在当前目录下,以下为代码:
<p><?php</p>/*完成网页内容捕获功能*/<br />function get_img_url($site_name){<br /> $site_fd = fopen($site_name, "r");<br /> $site_content = "";<br /> while (!feof($site_fd)) {<br /> $site_content .= fread($site_fd, 1024);<br /> }<br /> /*利用正则表达式得到图片链接*/<br /> $reg_tag = '/<img .*?\"([^\"]*(jpg|bmp|jpeg|gif)).*? alt="php抓取网站图片的程序" >/';<br /> $ret = preg_match_all($reg_tag, $site_content, $match_result);<br /> fclose($site_fd);<br /> return $match_result[1];<br />}<br /><br />/* 对图片链接进行修正 */<br />function revise_site($site_list, $base_site){<br /> foreach($site_list as $site_item) {<br /> if (preg_match('/^http/', $site_item)) {<br /> $return_list[] = $site_item;<br /> }else{<br /> $return_list[] = $base_site."/".$site_item;<br /> }<br /> }<br /> return $return_list;<br />}<br /><br />/*得到图片名字,并将其保存在指定位置*/<br />function get_pic_file($pic_url_array, $pos){<br /> $reg_tag = '/.*\/(.*?)$/';<br /> $count = 0;<br /> foreach($pic_url_array as $pic_item){<br /> $ret = preg_match_all($reg_tag,$pic_item,$t_pic_name);<br /> $pic_name = $pos.$t_pic_name[1][0];<br /> $pic_url = $pic_item;<br /> print("Downloading ".$pic_url." ");<br /> $img_read_fd = fopen($pic_url,"r");<br /> $img_write_fd = fopen($pic_name,"w");<br /> $img_content = "";<br /> while(!feof($img_read_fd)){<br /> $img_content .= fread($img_read_fd,1024);<br /> <br /> }<br /> fwrite($img_write_fd,$img_content);<br /> fclose($img_read_fd);<br /> fclose($img_write_fd);<br /> print("[OK] ");<br /> }<br /> return 0;<br />}<br /><br />function main(){<br />/* 待抓取图片的网页地址 */<br /> $site_name = "http://image.cn.yahoo.com";<br /> $img_url = get_img_url($site_name);<br /> $img_url_revised = revise_site($img_url, $site_name);<br /> $img_url_unique = array_unique($img_url_revised); //unique array<br /> get_pic_file($img_url_unique,"./"); <br />}<br /><br />main();<br /><p>?></p>
此程序还有待完善的地方是,如果图片在网站服务器上不同目录下但文件名是相同的,此时图片有可能是不一样的,但在最后保存时,后面得到的图片会将前面已经保存的图片覆盖掉,如在http://example.com/网站上有图片链接http://example.com/pic/test1.jpg和http://example.com/pic/new/test1.jpg那么在下载时这两张图片只有一张保存,另一张就被覆盖掉,修正的方法是在每次保存前先检索当前目录下是否已有此文件名,有的话对将要保存的图片重新命名即可。

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics



The PHP Client URL (cURL) extension is a powerful tool for developers, enabling seamless interaction with remote servers and REST APIs. By leveraging libcurl, a well-respected multi-protocol file transfer library, PHP cURL facilitates efficient execution of various network protocols, including HTTP, HTTPS, and FTP. This extension offers granular control over HTTP requests, supports multiple concurrent operations, and provides built-in security features.

Alipay PHP...

Do you want to provide real-time, instant solutions to your customers' most pressing problems? Live chat lets you have real-time conversations with customers and resolve their problems instantly. It allows you to provide faster service to your custom

Article discusses late static binding (LSB) in PHP, introduced in PHP 5.3, allowing runtime resolution of static method calls for more flexible inheritance.Main issue: LSB vs. traditional polymorphism; LSB's practical applications and potential perfo

JWT is an open standard based on JSON, used to securely transmit information between parties, mainly for identity authentication and information exchange. 1. JWT consists of three parts: Header, Payload and Signature. 2. The working principle of JWT includes three steps: generating JWT, verifying JWT and parsing Payload. 3. When using JWT for authentication in PHP, JWT can be generated and verified, and user role and permission information can be included in advanced usage. 4. Common errors include signature verification failure, token expiration, and payload oversized. Debugging skills include using debugging tools and logging. 5. Performance optimization and best practices include using appropriate signature algorithms, setting validity periods reasonably,

Article discusses essential security features in frameworks to protect against vulnerabilities, including input validation, authentication, and regular updates.

The article discusses adding custom functionality to frameworks, focusing on understanding architecture, identifying extension points, and best practices for integration and debugging.

Sending JSON data using PHP's cURL library In PHP development, it is often necessary to interact with external APIs. One of the common ways is to use cURL library to send POST�...
