Home php教程 php手册 网络爬虫脚本

网络爬虫脚本

Jun 06, 2016 pm 08:13 PM
crawl reptile program network Script need

最近需要写个脚本程序抓取一些网络数据,于是就有了常见的php脚本;测试代码如下: #!/usr/local/bin/php -q?php/** * Created by PhpStorm. * User: jackqqxu * Date: 14-9-12 * Time: 上午12:34 * 解析一个目录下面的文件,分析所有的静态资源然后下载下来

最近需要写个脚本程序抓取一些网络数据,于是就有了常见的php脚本;测试代码如下:

#!/usr/local/bin/php -q
<?php /**
 * Created by PhpStorm.
 * User: jackqqxu
 * Date: 14-9-12
 * Time: 上午12:34
 *  解析一个目录下面的文件,分析所有的静态资源然后下载下来;
 */
//echo "请输入需要提取的文件路径:\n";
//$path = fread(STDIN, 100);
//echo "程序即将读取 $path 路径下面的文件\n";
//echo "请输入需要提取的文件类型:\n";
//$type = fread(STDIN, 100);
// Open a known directory, and proceed to read its contents
//$path = '/Users/jackqqxu/Desktop/task/game/a_grain_of_truth_files/css/';
$destPath = '/Users/jackqqxu/task/aliyunsvn/health/grain/views/locations/'; //静态文件html
$sourcePath = '/Users/jackqqxu/task/aliyunsvn/health/grain/js/'; //静态文件html
//$baseUrl = 'http://www.zamolski.com/agot/resources/stylesheets/';
$netSourceUrl = 'http://www.zamolski.com/agot/views/locations/'; //现在获取位置信息
//$type = '.css';
$type = '.js';  //很多需要获取定位的位置信息;
$typeLen = strlen($type);
//echo 'r=' . realpath('/Users/jackqqxu/Desktop/task/game/a_grain_of_truth_files/css/../images/ui/frame_h.png') . "\n\n";
//echo "the programe will read the $type from the $path\n";
//if (!is_dir($destPath)) {
//    exec('mkdir -p ' . $destPath);
//}
    if ($dh = opendir($sourcePath)) {
        while (($file = readdir($dh)) !== false) {
            $fileType = filetype($sourcePath . $file);
            if ($fileType != 'file') {
                continue;
            }
//            echo 'f=' . $file . substr($file, strlen($file)-$typeLen) . "\n";
            if (substr($file, strlen($file)-$typeLen) == $type) {   //类型相同
//                echo "filename: $file : filetype: " . filetype($path . $file) . "\n";
                echo '$sourcePath . $file=' . $sourcePath . $file . "\n";
                $fileContentArr = file($sourcePath . $file);
                foreach($fileContentArr as $fileLine) {
//                    if ($fileLine =~ /url\((.*?)\)/){
//                    if (preg_match_all("/url\((.*?)\)/", $fileLine, $matches))  {   //css中通过url获取其他图片;
                    if (preg_match_all("/gotoLocation\(\"(.*?)\"\)/", $fileLine, $matches))  {   //中通过关键词获取其他文件;
//                        print_r($matches);exit;
//                        foreach($matches[1] as $matchImgUrl) {
                        foreach($matches[1] as $matchUrl) {
                            $sourceUrl = $netSourceUrl . $matchUrl . '.html';
                            echo 'n='.$sourceUrl."\n";//exit;
                            $descFile = $destPath . $matchUrl . '.html';
//                            echo 'fs=' . function_exists('realpath');
//                            echo 'ni=' . $newImgFile."\n";//exit;
//                            echo 'mkdir -p=' . dirname($newImgFile);
//                            exec('mkdir -p ' . dirname($newImgFile));
                            $ret = file_put_contents($descFile, file_get_contents($sourceUrl));
                            if ($ret) {
                                echo "文件$descFile 写入成功\n";
//                                exit;
                            }
//                            exit;
                        }
                    }
                }
            }
        }
        closedir($dh);
    }
?>
Copy after login


codingless|网络爬虫脚本 Tags:  

Del.icio.us
codingless|网络爬虫脚本
Facebook
codingless|网络爬虫脚本
TweetThis
codingless|网络爬虫脚本
Digg
codingless|网络爬虫脚本
StumbleUpon
codingless|网络爬虫脚本

Comments:  0 (Zero), Be the first to leave a reply!


You might be interested in this:  

  • codingless|网络爬虫脚本  Ubuntu 安装JRE7的快捷方法(验证有效)
  • codingless|网络爬虫脚本  BigPipe的技术实现【转】
  • codingless|网络爬虫脚本  'insertCell' called on an object that does not implement interface HTMLTableRowElement.
  • codingless|网络爬虫脚本  javascript性能优化-repaint和reflow
  • codingless|网络爬虫脚本  Fiddler工作原理

Copyright © web代码网 [网络爬虫脚本], All Right Reserved. 2014.
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. Best Graphic Settings
3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. How to Fix Audio if You Can't Hear Anyone
3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
WWE 2K25: How To Unlock Everything In MyRise
4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

How to make Google Maps the default map in iPhone How to make Google Maps the default map in iPhone Apr 17, 2024 pm 07:34 PM

The default map on the iPhone is Maps, Apple's proprietary geolocation provider. Although the map is getting better, it doesn't work well outside the United States. It has nothing to offer compared to Google Maps. In this article, we discuss the feasible steps to use Google Maps to become the default map on your iPhone. How to Make Google Maps the Default Map in iPhone Setting Google Maps as the default map app on your phone is easier than you think. Follow the steps below – Prerequisite steps – You must have Gmail installed on your phone. Step 1 – Open the AppStore. Step 2 – Search for “Gmail”. Step 3 – Click next to Gmail app

WLAN expansion module has stopped [fix] WLAN expansion module has stopped [fix] Feb 19, 2024 pm 02:18 PM

If there is a problem with the WLAN expansion module on your Windows computer, it may cause you to be disconnected from the Internet. This situation is often frustrating, but fortunately, this article provides some simple suggestions that can help you solve this problem and get your wireless connection working properly again. Fix WLAN Extensibility Module Has Stopped If the WLAN Extensibility Module has stopped working on your Windows computer, follow these suggestions to fix it: Run the Network and Internet Troubleshooter to disable and re-enable wireless network connections Restart the WLAN Autoconfiguration Service Modify Power Options Modify Advanced Power Settings Reinstall Network Adapter Driver Run Some Network Commands Now, let’s look at it in detail

How to solve win11 DNS server error How to solve win11 DNS server error Jan 10, 2024 pm 09:02 PM

We need to use the correct DNS when connecting to the Internet to access the Internet. In the same way, if we use the wrong dns settings, it will prompt a dns server error. At this time, we can try to solve the problem by selecting to automatically obtain dns in the network settings. Let’s take a look at the specific solutions. How to solve win11 network dns server error. Method 1: Reset DNS 1. First, click Start in the taskbar to enter, find and click the "Settings" icon button. 2. Then click the "Network & Internet" option command in the left column. 3. Then find the "Ethernet" option on the right and click to enter. 4. After that, click "Edit" in the DNS server assignment, and finally set DNS to "Automatic (D

What should I do if the earth is displayed in the lower right corner of Windows 10 when I cannot access the Internet? Various solutions to the problem that the Earth cannot access the Internet in Win10 What should I do if the earth is displayed in the lower right corner of Windows 10 when I cannot access the Internet? Various solutions to the problem that the Earth cannot access the Internet in Win10 Feb 29, 2024 am 09:52 AM

This article will introduce the solution to the problem that the globe symbol is displayed on the Win10 system network but cannot access the Internet. The article will provide detailed steps to help readers solve the problem of Win10 network showing that the earth cannot access the Internet. Method 1: Restart directly. First check whether the network cable is not plugged in properly and whether the broadband is in arrears. The router or optical modem may be stuck. In this case, you need to restart the router or optical modem. If there are no important things being done on the computer, you can restart the computer directly. Most minor problems can be quickly solved by restarting the computer. If it is determined that the broadband is not in arrears and the network is normal, that is another matter. Method 2: 1. Press the [Win] key, or click [Start Menu] in the lower left corner. In the menu item that opens, click the gear icon above the power button. This is [Settings].

How to create a script for editing? Tutorial on how to create a script through editing How to create a script for editing? Tutorial on how to create a script through editing Mar 13, 2024 pm 12:46 PM

Cutting is a video editing tool with comprehensive editing functions, support for variable speed, various filters and beauty effects, and rich music library resources. In this software, you can edit videos directly or create editing scripts, but how to do it? In this tutorial, the editor will introduce the method of editing and making scripts. Production method: 1. Click to open the editing software on your computer, then find the "Creation Script" option and click to open. 2. In the creation script page, enter the "script title", and then enter a brief introduction to the shooting content in the outline. 3. How can I see the "Storyboard Description" option in the outline?

How to execute .sh file in Linux system? How to execute .sh file in Linux system? Mar 14, 2024 pm 06:42 PM

How to execute .sh file in Linux system? In Linux systems, a .sh file is a file called a Shell script, which is used to execute a series of commands. Executing .sh files is a very common operation. This article will introduce how to execute .sh files in Linux systems and provide specific code examples. Method 1: Use an absolute path to execute a .sh file. To execute a .sh file in a Linux system, you can use an absolute path to specify the location of the file. The following are the specific steps: Open the terminal

Check network connection: lol cannot connect to the server Check network connection: lol cannot connect to the server Feb 19, 2024 pm 12:10 PM

LOL cannot connect to the server, please check the network. In recent years, online games have become a daily entertainment activity for many people. Among them, League of Legends (LOL) is a very popular multiplayer online game, attracting the participation and interest of hundreds of millions of players. However, sometimes when we play LOL, we will encounter the error message "Unable to connect to the server, please check the network", which undoubtedly brings some trouble to players. Next, we will discuss the causes and solutions of this error. First of all, the problem that LOL cannot connect to the server may be

Clock app missing in iPhone: How to fix it Clock app missing in iPhone: How to fix it May 03, 2024 pm 09:19 PM

Is the clock app missing from your phone? The date and time will still appear on your iPhone's status bar. However, without the Clock app, you won’t be able to use world clock, stopwatch, alarm clock, and many other features. Therefore, fixing missing clock app should be at the top of your to-do list. These solutions can help you resolve this issue. Fix 1 – Place the Clock App If you mistakenly removed the Clock app from your home screen, you can put the Clock app back in its place. Step 1 – Unlock your iPhone and start swiping to the left until you reach the App Library page. Step 2 – Next, search for “clock” in the search box. Step 3 – When you see “Clock” below in the search results, press and hold it and

See all articles