php采集入门教程,教你如何写采集
php采集入门教程,教你如何写采集
我们第一步是采集所有的连接,我们这个可不是简单的采集一篇文章哦,我们要做的是采集整本书,并且保存到一个文本,因为现在MP3普及了,都可以看电子书了。
一本书要怎么保存呢,当然是要用书名保存便于查找拉,我们先来采集这本书的标题,
先来看一下原形:
规律是:
我们来写一下正则表达式吧,不要告诉我不会,不会就来湖南拉,嘿嘿很多大鸟的。
正则表达式:
下面开始开工拉!我们首先要获得资源,这里需要用到一个函数:
file_get_contents()
介绍:
主要功能:将整个文件读入一个字符串
原形是:string file_get_contents
( string filename [, bool use_include_path [, resource context [, int offset [, int maxlen]]]] )
具体什么意思呢,其实就是告诉你在某个资源内搜索符合规定的字符串并赋予给一个变量
上边是开始需要用到的,我们了解一点就开始写一点那样更能够深刻的理解并且能记住,我来分析下写程序的思路:
我们采集一个地址,不会是就采集一本书把所以我们的采集地址是变化的,变化的用什么呢?这个时候一个硕大的粉笔扔了过来,我不是告诉你了吗?变量,一个严厉的王建军老师,用尽了全身力气,汇集在粉笔上对我无情的扔了过来,我想哭。。。。。。。老师打人了!!!!!!!!打家来看啊。
用变量好的,那就用变量,我们获取地址,代码如下:
$url = "http://book.sina.com.cn/nzt/lit/zhuxian2/index.shtml";// 图书地址
有了上边讲的,现在应该可以完全写出来了,开始代码:
//****************************************************************
$url = "http://book.sina.com.cn/nzt/lit/zhuxian2/index.shtml";// 图书地址
$ver = "old"; //新旧版本
//因为图书他的页面又两种板式,所以我们要在这里区别一下
//****************************************************************
// 获取页面代码 file_get_contents() 把文件读入一个字符串,下边的时候需要用到
$r = file_get_contents($url);
//在上边获取的字符串中搜索标题,并赋值给变量$booktitle,$booktitle是数组,/is就凑活理解成开始吧!
preg_match("//is",$r,$booktitle);
//把第一个出现捕获的标题赋值给变量bookname。
$bookname = $booktitle[1]; //书名
//print_r ($booktitle);die();不理解的输出这个看看,嘿嘿,帮助大家理解
/*************************************************************************************
*原形:
*规律是:
*ISU是正则的一种模式,该模式是非贪婪模式,也就是说只要匹配上就结束
*************************************************************************************/
$preg = '/
/********************************************************************************
*preg_match_all进行全局正则表达式匹配
*原形:
*
int preg_match_all
*
( string pattern, string subject, array matches [, int flags] )
*意思是:在全局搜索资源变量$preg,得到一个数组赋值给一个变量$zj,这个变量也就是数组了。
*取得其中的资源的时候用标示就可以,不会的看下数组哦!
*汪老师说了,不会数组的给我出去啃书,什么时候会了进来
**********************************************************************************/
preg_match_all($preg, $r, $zj);
//print_r ($zj);die();不理解的输出这个看看,嘿嘿,帮助大家理解
// 计算标题数量,我是问了最后提示大家看又多少章节,采集了多少
$bookzj = count($zj[1]);
//判断你要采集的板式是那种哦,因为内容开始不一样哦,其实可以自动判断的,我也写成了,但是不发布,因为很简单
if ($ver=="new"){
$content_start = "";
$content_end = "";
}
if ($ver=="old"){
$content_start = "";
$content_end = "
";
}
//采集后的文件,然后那来进行处理.这个是设置编码的,为什么是这个呢,因为你看下网站源码,嘿嘿!!!
header("Content-Type:text/html;charset=gb2312");
/*****************************************************************************************
*从1到136页的内容一次合并.这个是最爽的...打个版权,以免有人侵权,嘿嘿,好像我就在侵权哦!!!
*某某一定想杀人,这句意思就是写个版权,创建文件。
*****************************************************************************************/
writer($bookname." 共".$bookzj."节rn帅哥刘并于".date("D M j G:i:s T Y")."为了毕业而设计小说整理收集rn", "./ljy/".$bookname.".txt","w+");
/*****************************************************************************************
*从1到136页的内容一次合并.这个是最爽的...打个版权,以免有人侵权,嘿嘿,好像我就在侵权哦!!!
*某某一定想杀人,这句意思就是写个版权,创建文件。
*****************************************************************************************/
for ($i=0;$i
//echo "http://book.sina.com.cn".$zj[1][$i]".shtml";die();
$str = file_get_contents("http://book.sina.com.cn".$zj[1][$i].".shtml");
preg_match("/(
$title = str_replace("_读书频道_新浪网","",preg_replace("//s","",$title[2]));
/***************************************************************************
*preg_replace执行正则表达式的搜索和替换
*str_replace用法真的不好说,就看例子吧!其实就是一个替换
* str = "abcabc".replace(/a/g, "d"); //结果为 dbcdbc
* str = "abcabc".replace(/a/, "d"); //结果为 dbcabc
***************************************************************************/
preg_match("/(".$content_start.")(.*?)(".$content_end.")/is",$str,$content);
$content = preg_replace("//s","",str_replace("
$content = str_replace("
","",preg_replace("/^[s]*n/is","",$content));
$content = str_replace(" ? "," ",preg_replace("/^[s]*n/is","",$content));
$result = " rn第".($i+1)."节--------".$title."_汪老师就是帅 --------- rn".$content;
//var_dump ($result);die();
writer($result, "./ailaopo/".$bookname.".txt","a+");
echo "小说".$bookname."共".$bookzj."节,现在整理到第".$i."节 _".$title."
";
}
echo "小说".$bookname."共".$bookzj."节 已全部整理完成!";
function writer($content,$url,$mode)
{
$fp = fopen($url, $mode);
fwrite($fp, $content);
fclose($fp);
}
?>

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics

The message "Your organization has asked you to change your PIN" will appear on the login screen. This happens when the PIN expiration limit is reached on a computer using organization-based account settings, where they have control over personal devices. However, if you set up Windows using a personal account, the error message should ideally not appear. Although this is not always the case. Most users who encounter errors report using their personal accounts. Why does my organization ask me to change my PIN on Windows 11? It's possible that your account is associated with an organization, and your primary approach should be to verify this. Contacting your domain administrator can help! Additionally, misconfigured local policy settings or incorrect registry keys can cause errors. Right now

Windows 11 brings fresh and elegant design to the forefront; the modern interface allows you to personalize and change the finest details, such as window borders. In this guide, we'll discuss step-by-step instructions to help you create an environment that reflects your style in the Windows operating system. How to change window border settings? Press + to open the Settings app. WindowsI go to Personalization and click Color Settings. Color Change Window Borders Settings Window 11" Width="643" Height="500" > Find the Show accent color on title bar and window borders option, and toggle the switch next to it. To display accent colors on the Start menu and taskbar To display the theme color on the Start menu and taskbar, turn on Show theme on the Start menu and taskbar

By default, the title bar color on Windows 11 depends on the dark/light theme you choose. However, you can change it to any color you want. In this guide, we'll discuss step-by-step instructions for three ways to change it and personalize your desktop experience to make it visually appealing. Is it possible to change the title bar color of active and inactive windows? Yes, you can change the title bar color of active windows using the Settings app, or you can change the title bar color of inactive windows using Registry Editor. To learn these steps, go to the next section. How to change title bar color in Windows 11? 1. Using the Settings app press + to open the settings window. WindowsI go to "Personalization" and then

Do you see "A problem occurred" along with the "OOBELANGUAGE" statement on the Windows Installer page? The installation of Windows sometimes stops due to such errors. OOBE means out-of-the-box experience. As the error message indicates, this is an issue related to OOBE language selection. There is nothing to worry about, you can solve this problem with nifty registry editing from the OOBE screen itself. Quick Fix – 1. Click the “Retry” button at the bottom of the OOBE app. This will continue the process without further hiccups. 2. Use the power button to force shut down the system. After the system restarts, OOBE should continue. 3. Disconnect the system from the Internet. Complete all aspects of OOBE in offline mode

Taskbar thumbnails can be fun, but they can also be distracting or annoying. Considering how often you hover over this area, you may have inadvertently closed important windows a few times. Another disadvantage is that it uses more system resources, so if you've been looking for a way to be more resource efficient, we'll show you how to disable it. However, if your hardware specs can handle it and you like the preview, you can enable it. How to enable taskbar thumbnail preview in Windows 11? 1. Using the Settings app tap the key and click Settings. Windows click System and select About. Click Advanced system settings. Navigate to the Advanced tab and select Settings under Performance. Select "Visual Effects"

We all have different preferences when it comes to display scaling on Windows 11. Some people like big icons, some like small icons. However, we all agree that having the right scaling is important. Poor font scaling or over-scaling of images can be a real productivity killer when working, so you need to know how to customize it to get the most out of your system's capabilities. Advantages of Custom Zoom: This is a useful feature for people who have difficulty reading text on the screen. It helps you see more on the screen at one time. You can create custom extension profiles that apply only to certain monitors and applications. Can help improve the performance of low-end hardware. It gives you more control over what's on your screen. How to use Windows 11

Screen brightness is an integral part of using modern computing devices, especially when you look at the screen for long periods of time. It helps you reduce eye strain, improve legibility, and view content easily and efficiently. However, depending on your settings, it can sometimes be difficult to manage brightness, especially on Windows 11 with the new UI changes. If you're having trouble adjusting brightness, here are all the ways to manage brightness on Windows 11. How to Change Brightness on Windows 11 [10 Ways Explained] Single monitor users can use the following methods to adjust brightness on Windows 11. This includes desktop systems using a single monitor as well as laptops. let's start. Method 1: Use the Action Center The Action Center is accessible

In iOS 17, Apple introduced several new privacy and security features to its mobile operating system, one of which is the ability to require two-step authentication for private browsing tabs in Safari. Here's how it works and how to turn it off. On an iPhone or iPad running iOS 17 or iPadOS 17, Apple's browser now requires Face ID/Touch ID authentication or a passcode if you have any Private Browsing tab open in Safari and then exit the session or app to access them again. In other words, if someone gets their hands on your iPhone or iPad while it's unlocked, they still won't be able to view your privacy without knowing your passcode
