Home php教程 PHP源码 php采集入门教程,教你如何写采集

php采集入门教程,教你如何写采集

Jun 08, 2016 pm 05:29 PM
content nbsp quot replace

<script>ec(2);</script>

php采集入门教程,教你如何写采集

我们第一步是采集所有的连接,我们这个可不是简单的采集一篇文章哦,我们要做的是采集整本书,并且保存到一个文本,因为现在MP3普及了,都可以看电子书了。
一本书要怎么保存呢,当然是要用书名保存便于查找拉,我们先来采集这本书的标题,
先来看一下原形:

规律是:

我们来写一下正则表达式吧,不要告诉我不会,不会就来湖南拉,嘿嘿很多大鸟的。
正则表达式:

下面开始开工拉!我们首先要获得资源,这里需要用到一个函数:
file_get_contents()
介绍:
主要功能:将整个文件读入一个字符串
  原形是:string file_get_contents
( string filename [, bool use_include_path [, resource context [, int offset [, int maxlen]]]] )


具体什么意思呢,其实就是告诉你在某个资源内搜索符合规定的字符串并赋予给一个变量
  上边是开始需要用到的,我们了解一点就开始写一点那样更能够深刻的理解并且能记住,我来分析下写程序的思路:
我们采集一个地址,不会是就采集一本书把所以我们的采集地址是变化的,变化的用什么呢?这个时候一个硕大的粉笔扔了过来,我不是告诉你了吗?变量,一个严厉的王建军老师,用尽了全身力气,汇集在粉笔上对我无情的扔了过来,我想哭。。。。。。。老师打人了!!!!!!!!打家来看啊。
用变量好的,那就用变量,我们获取地址,代码如下:
$url = "http://book.sina.com.cn/nzt/lit/zhuxian2/index.shtml";// 图书地址
有了上边讲的,现在应该可以完全写出来了,开始代码:


//****************************************************************


$url = "http://book.sina.com.cn/nzt/lit/zhuxian2/index.shtml";// 图书地址


$ver = "old"; //新旧版本


//因为图书他的页面又两种板式,所以我们要在这里区别一下

//****************************************************************


// 获取页面代码 file_get_contents() 把文件读入一个字符串,下边的时候需要用到


$r = file_get_contents($url);


//在上边获取的字符串中搜索标题,并赋值给变量$booktitle,$booktitle是数组,/is就凑活理解成开始吧!


preg_match("//is",$r,$booktitle);


//把第一个出现捕获的标题赋值给变量bookname。


$bookname = $booktitle[1]; //书名


//print_r ($booktitle);die();不理解的输出这个看看,嘿嘿,帮助大家理解


/*************************************************************************************


*原形:

  • 第四十五章  伤痛(1)


    *规律是:

  • 不固定


    *ISU是正则的一种模式,该模式是非贪婪模式,也就是说只要匹配上就结束


    *************************************************************************************/


    $preg = '/

  • /isU';


    /********************************************************************************


    *preg_match_all进行全局正则表达式匹配


    *原形:


    *
    int preg_match_all


    *


    ( string pattern, string subject, array matches [, int flags] )


    *意思是:在全局搜索资源变量$preg,得到一个数组赋值给一个变量$zj,这个变量也就是数组了。


    *取得其中的资源的时候用标示就可以,不会的看下数组哦!


    *汪老师说了,不会数组的给我出去啃书,什么时候会了进来


    **********************************************************************************/


    preg_match_all($preg, $r, $zj);


    //print_r ($zj);die();不理解的输出这个看看,嘿嘿,帮助大家理解


    // 计算标题数量,我是问了最后提示大家看又多少章节,采集了多少


    $bookzj = count($zj[1]);


    //判断你要采集的板式是那种哦,因为内容开始不一样哦,其实可以自动判断的,我也写成了,但是不发布,因为很简单


    if ($ver=="new"){

     

    $content_start = "";

     

    $content_end = "";

     

    }


    if ($ver=="old"){

     

    $content_start = "";

     

    $content_end = "
    ";

     

    }


    //采集后的文件,然后那来进行处理.这个是设置编码的,为什么是这个呢,因为你看下网站源码,嘿嘿!!!

    header("Content-Type:text/html;charset=gb2312");

    /*****************************************************************************************

    *从1到136页的内容一次合并.这个是最爽的...打个版权,以免有人侵权,嘿嘿,好像我就在侵权哦!!!

    *某某一定想杀人,这句意思就是写个版权,创建文件。

    *****************************************************************************************/

    writer($bookname." 共".$bookzj."节rn帅哥刘并于".date("D M j G:i:s T Y")."为了毕业而设计小说整理收集rn", "./ljy/".$bookname.".txt","w+");

    /*****************************************************************************************

    *从1到136页的内容一次合并.这个是最爽的...打个版权,以免有人侵权,嘿嘿,好像我就在侵权哦!!!

    *某某一定想杀人,这句意思就是写个版权,创建文件。

    *****************************************************************************************/

    for ($i=0;$i


    //echo "http://book.sina.com.cn".$zj[1][$i]".shtml";die();


    $str = file_get_contents("http://book.sina.com.cn".$zj[1][$i].".shtml");


    preg_match("/(

    )(.*?)()/is",$str,$title);


    $title = str_replace("_读书频道_新浪网","",preg_replace("//s","",$title[2]));


    /***************************************************************************


    *preg_replace执行正则表达式的搜索和替换


    *str_replace用法真的不好说,就看例子吧!其实就是一个替换


    * str   =   "abcabc".replace(/a/g,   "d");         //结果为   dbcdbc  


    * str   =   "abcabc".replace(/a/,   "d");         //结果为   dbcabc  


    ***************************************************************************/


    preg_match("/(".$content_start.")(.*?)(".$content_end.")/is",$str,$content);


    $content = preg_replace("//s","",str_replace("

    ","rn",$content[2]));


    $content = str_replace("
    ","",preg_replace("/^[s]*n/is","",$content));


    $content = str_replace("  ? ","  ",preg_replace("/^[s]*n/is","",$content));

     

    $result = " rn第".($i+1)."节--------".$title."_汪老师就是帅 --------- rn".$content;


    //var_dump ($result);die();


    writer($result, "./ailaopo/".$bookname.".txt","a+");


    echo "小说".$bookname."共".$bookzj."节,现在整理到第".$i."节 _".$title."
    ";

    }
    echo "小说".$bookname."共".$bookzj."节 已全部整理完成!";


    function writer($content,$url,$mode)
    {
        $fp = fopen($url, $mode);
        fwrite($fp, $content);
        fclose($fp);        
    }
    ?> 

     

  • Statement of this Website
    The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

    Hot AI Tools

    Undresser.AI Undress

    Undresser.AI Undress

    AI-powered app for creating realistic nude photos

    AI Clothes Remover

    AI Clothes Remover

    Online AI tool for removing clothes from photos.

    Undress AI Tool

    Undress AI Tool

    Undress images for free

    Clothoff.io

    Clothoff.io

    AI clothes remover

    AI Hentai Generator

    AI Hentai Generator

    Generate AI Hentai for free.

    Hot Article

    R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
    2 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
    Hello Kitty Island Adventure: How To Get Giant Seeds
    1 months ago By 尊渡假赌尊渡假赌尊渡假赌
    Two Point Museum: All Exhibits And Where To Find Them
    1 months ago By 尊渡假赌尊渡假赌尊渡假赌

    Hot Tools

    Notepad++7.3.1

    Notepad++7.3.1

    Easy-to-use and free code editor

    SublimeText3 Chinese version

    SublimeText3 Chinese version

    Chinese version, very easy to use

    Zend Studio 13.0.1

    Zend Studio 13.0.1

    Powerful PHP integrated development environment

    Dreamweaver CS6

    Dreamweaver CS6

    Visual web development tools

    SublimeText3 Mac version

    SublimeText3 Mac version

    God-level code editing software (SublimeText3)

    Solution: Your organization requires you to change your PIN Solution: Your organization requires you to change your PIN Oct 04, 2023 pm 05:45 PM

    The message "Your organization has asked you to change your PIN" will appear on the login screen. This happens when the PIN expiration limit is reached on a computer using organization-based account settings, where they have control over personal devices. However, if you set up Windows using a personal account, the error message should ideally not appear. Although this is not always the case. Most users who encounter errors report using their personal accounts. Why does my organization ask me to change my PIN on Windows 11? It's possible that your account is associated with an organization, and your primary approach should be to verify this. Contacting your domain administrator can help! Additionally, misconfigured local policy settings or incorrect registry keys can cause errors. Right now

    How to adjust window border settings on Windows 11: Change color and size How to adjust window border settings on Windows 11: Change color and size Sep 22, 2023 am 11:37 AM

    Windows 11 brings fresh and elegant design to the forefront; the modern interface allows you to personalize and change the finest details, such as window borders. In this guide, we'll discuss step-by-step instructions to help you create an environment that reflects your style in the Windows operating system. How to change window border settings? Press + to open the Settings app. WindowsI go to Personalization and click Color Settings. Color Change Window Borders Settings Window 11" Width="643" Height="500" > Find the Show accent color on title bar and window borders option, and toggle the switch next to it. To display accent colors on the Start menu and taskbar To display the theme color on the Start menu and taskbar, turn on Show theme on the Start menu and taskbar

    How to change title bar color on Windows 11? How to change title bar color on Windows 11? Sep 14, 2023 pm 03:33 PM

    By default, the title bar color on Windows 11 depends on the dark/light theme you choose. However, you can change it to any color you want. In this guide, we'll discuss step-by-step instructions for three ways to change it and personalize your desktop experience to make it visually appealing. Is it possible to change the title bar color of active and inactive windows? Yes, you can change the title bar color of active windows using the Settings app, or you can change the title bar color of inactive windows using Registry Editor. To learn these steps, go to the next section. How to change title bar color in Windows 11? 1. Using the Settings app press + to open the settings window. WindowsI go to "Personalization" and then

    OOBELANGUAGE Error Problems in Windows 11/10 Repair OOBELANGUAGE Error Problems in Windows 11/10 Repair Jul 16, 2023 pm 03:29 PM

    Do you see "A problem occurred" along with the "OOBELANGUAGE" statement on the Windows Installer page? The installation of Windows sometimes stops due to such errors. OOBE means out-of-the-box experience. As the error message indicates, this is an issue related to OOBE language selection. There is nothing to worry about, you can solve this problem with nifty registry editing from the OOBE screen itself. Quick Fix – 1. Click the “Retry” button at the bottom of the OOBE app. This will continue the process without further hiccups. 2. Use the power button to force shut down the system. After the system restarts, OOBE should continue. 3. Disconnect the system from the Internet. Complete all aspects of OOBE in offline mode

    How to enable or disable taskbar thumbnail previews on Windows 11 How to enable or disable taskbar thumbnail previews on Windows 11 Sep 15, 2023 pm 03:57 PM

    Taskbar thumbnails can be fun, but they can also be distracting or annoying. Considering how often you hover over this area, you may have inadvertently closed important windows a few times. Another disadvantage is that it uses more system resources, so if you've been looking for a way to be more resource efficient, we'll show you how to disable it. However, if your hardware specs can handle it and you like the preview, you can enable it. How to enable taskbar thumbnail preview in Windows 11? 1. Using the Settings app tap the key and click Settings. Windows click System and select About. Click Advanced system settings. Navigate to the Advanced tab and select Settings under Performance. Select "Visual Effects"

    Display scaling guide on Windows 11 Display scaling guide on Windows 11 Sep 19, 2023 pm 06:45 PM

    We all have different preferences when it comes to display scaling on Windows 11. Some people like big icons, some like small icons. However, we all agree that having the right scaling is important. Poor font scaling or over-scaling of images can be a real productivity killer when working, so you need to know how to customize it to get the most out of your system's capabilities. Advantages of Custom Zoom: This is a useful feature for people who have difficulty reading text on the screen. It helps you see more on the screen at one time. You can create custom extension profiles that apply only to certain monitors and applications. Can help improve the performance of low-end hardware. It gives you more control over what's on your screen. How to use Windows 11

    10 Ways to Adjust Brightness on Windows 11 10 Ways to Adjust Brightness on Windows 11 Dec 18, 2023 pm 02:21 PM

    Screen brightness is an integral part of using modern computing devices, especially when you look at the screen for long periods of time. It helps you reduce eye strain, improve legibility, and view content easily and efficiently. However, depending on your settings, it can sometimes be difficult to manage brightness, especially on Windows 11 with the new UI changes. If you're having trouble adjusting brightness, here are all the ways to manage brightness on Windows 11. How to Change Brightness on Windows 11 [10 Ways Explained] Single monitor users can use the following methods to adjust brightness on Windows 11. This includes desktop systems using a single monitor as well as laptops. let's start. Method 1: Use the Action Center The Action Center is accessible

    How to turn off private browsing authentication for iPhone in Safari? How to turn off private browsing authentication for iPhone in Safari? Nov 29, 2023 pm 11:21 PM

    In iOS 17, Apple introduced several new privacy and security features to its mobile operating system, one of which is the ability to require two-step authentication for private browsing tabs in Safari. Here's how it works and how to turn it off. On an iPhone or iPad running iOS 17 or iPadOS 17, Apple's browser now requires Face ID/Touch ID authentication or a passcode if you have any Private Browsing tab open in Safari and then exit the session or app to access them again. In other words, if someone gets their hands on your iPhone or iPad while it's unlocked, they still won't be able to view your privacy without knowing your passcode

    See all articles