PHP爬取糗事百科首页糗事
PHP爬取糗事百科首页糗事
突然想获取一些网上的数据来玩玩,因为有SAE的MySql数据库,让它在那呆着没有什么卵用!于是就开始用PHP编写一个爬取糗事百科首页糗事的小程序,数据都保存在MySql中,岂不是很好玩!
说干就干!首先确定思路
获取HTML源码--->解析HTML--->保存到数据库
没有什么难的
1、创建PHP文件“getDataToDB.php”,
2、获取指定URL的HTML源码
这里我用的是curl函数,详细内容参见PHP手册
代码为
<span new="" style="font-family:Times">// 获取对应链接的HTMLCODE function GetHtmlCode($url) { $ch = curl_init (); // 初始化一个cur对象 curl_setopt ( $ch, CURLOPT_URL, $url ); // 设置需要抓取的网页 curl_setopt ( $ch, CURLOPT_RETURNTRANSFER, 1 ); // 设置crul参数,要求结果保存到字符串中还是输出到屏幕上 curl_setopt ( $ch, CURLOPT_CONNECTTIMEOUT, 1000 ); // 设置链接延迟 $HtmlCode = curl_exec ( $ch ); // 运行curl,请求网页 return $HtmlCode; }</span>
这里我没有能力使用正则表达式,就在网上海搜,终于找到这个,就像Java使用Jsoup(使用Jsoup解析滁州学院官网获取新闻列表)一样,具体参见BLOG
代码如下
<span new="" style="font-family:Times">function getFmlDataToDB() { $link = mysql_connect ( SAE_MYSQL_HOST_M . ':' . SAE_MYSQL_PORT, SAE_MYSQL_USER, SAE_MYSQL_PASS ); // 获取源码 $html = str_get_html ( GetHtmlCode ( http://www.qiushibaike.com/ ) ); if ($link) { mysql_select_db ( SAE_MYSQL_DB, $link ); mysql_query ( 'set names utf8' ); // class=article block untagged mb15 foreach ( $html->find ( 'div[class=article block untagged mb15]' ) as $per ) { $z = null; $t = null; $w = null; $d = null; $p = null; $ds = null; $ps = null; // //作者 $author = $per->find ( 'div[class=author]' ); if ($author != null) { $a = $author [0]->find ( 'a' ); $z = $a [1]->innertext; } else { $z = 'no author'; } // 头像链接 if ($author != null) { $icon = $author [0]->find ( 'a' ); $t = $icon [0]->src->innertext; } else { $t = '...............'; } // 文章内容 $content = $per->find ( 'div[class=content]' ); $w = $content [0]->innertext; // 点赞数 $vote1 = $per->find ( 'div[class=stats]' ); $vote2 = $vote1 [0]->find ( 'span[class=stats-vote]' ); $vote3 = $vote2 [0]->find ( 'i[class=number]' ); $d = $vote3 [0]->innertext; // 评论数 $comments1 = $vote1 [0]->find ( 'span[class=stats-comments]' ); $comments2 = $comments1 [0]->find ( 'a[class=qiushi_comments]' ); $comments3 = $comments2 [0]->find ( 'i[class=number]' ); $p = $comments3 [0]->innertext; // 顶 数 $up_down = $per->find ( 'div[class=stats-buttons bar clearfix]' ); $up_down1 = $up_down [0]->find ( 'ul' ); $li = $up_down1 [0]->find ( 'li' ); $up = $li [0]->find ( 'span[class=number hidden]' ); $ds = $up [0]->innertext; // 拍 数 $down = $li [1]->find ( 'span[class=number hidden]' ); $ps = $down [0]->innertext; } } else { echo '数据库链接KO'; } }</span>
4、创建数据库,将数据插入到数据库中
这里我使用的SAE中的MySQL,具体的连接方发参见使用PHP连接SAE中的MySql数据库
需要注意的就是编码格式,区要在执行语句前加上这样一句话
<span style="font-family:Microsoft">mysql_query ( 'set names utf8' );</span>
<span style="font-family:Microsoft"> $sql = INSERT INTO `app_bmhjqs`.`db_fml` (`id`, `author`, `icon_url`, `content`, `vote`, `comments`, `up`, `down`) VALUES (NULL, '$z', '$t', '$w', '$d', '$p', '$ds', '$ps');; // 解决乱码 mysql_query ( 'set names utf8' ); $result = mysql_query ( $sql );</span>
这样一来,获取--->解析--->插入就完成了,效果就是运行一次PHP文件,数据库就添加了糗事百科首页上的糗事!我想可不可以写个定时器,每隔一定时间就运行一次代码,这一点在java我可以实现,在php我不会,毕竟是个没长毛的小鸟!百度吧。。。搜到这样的写法
<span new="" style="font-family:Times">// 定时器 // ignore_user_abort (); // run script. in background // set_time_limit ( 0 ); // run script. forever // $interval = 30; // do every 15 minutes.. // do { // echo date ( 'Y-m-d H:i:s', time () ); // echo '写入数据库'; // //getFmlDataToDB (); // } while ( true );</span>
今天早上,我迫不及待的打开电脑,打开SAE数据库,情况如下:
额滴神!受不鸟了,赶紧把定时器关掉了,写了个按钮触发事件!这样下去,数据库会被挤满的!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics



How to set up Google Chrome homepage? Google Chrome is the most popular web browser software today. This browser has simple and efficient features that users like. When using browsers, different people have different settings preferences. Some people like to use Google Chrome. The browser is set as the default homepage, and some people like to set the homepage as other search engines, so where should it be set? Next, the editor will bring you a quick method to set up the homepage of Google Chrome. I hope it can be helpful to you. How to quickly set the Google Chrome homepage 1. Open Google Chrome (as shown in the picture). 2. Click the menu button in the upper right corner of the interface (as shown in the picture). 3. Select the "Settings" option (as shown in the picture). 4. In the settings menu, find "Search Engine" (such as

What should I do if the Google Chrome homepage changes to 360? Google Chrome is a simple and convenient browser, but many friends find that the simple homepage has been replaced by a 360 homepage during use. If they want to restore it to its original style, how should it be set? Below, the editor will show you how to restore the Google Chrome homepage. Solution: 1. First open Google Chrome. 2. If you want to change it to the default, click the three dots in the upper right corner. 3. Click [Settings] to open the settings page. 4. Click [Startup]. 5. As shown in the picture, [Open a specific web page or a group of web pages] here is the URL of 360 Navigation. 6. Click the three dots on the right side of the 360 navigation. 7. Click [Remove].

Design and development method of UniApp to realize home page and navigation page 1. Introduction UniApp is a cross-platform development tool built on the Vue.js framework, which can compile a set of codes to produce applications for multiple platforms. In UniApp, the homepage and navigation page are two necessary pages when developing applications. This article will introduce how to design and develop these two pages in UniApp, and provide corresponding code examples. 2. Home page design and development method Page structure UniApp’s home page generally includes a title bar, carousel, and classification

Methods to return to the homepage from html subpages: 1. Use hyperlinks; 2. Use JavaScript; 3. Use browser history. Detailed introduction: 1. Use hyperlinks, add a hyperlink in the sub-page, link it to the URL of the home page, add a "return to home page" link at the bottom of the sub-page or in the navigation bar, use "<a>" tag to create a hyperlink, set the "href" attribute to the URL of the homepage; 2. Use JavaScript to implement the function of returning to the homepage through JavaScript code, etc.

"Adventure Treasure Hunt and Defeat the Demon King" is a RogueLike war chess game with a Western fantasy background. The new game is online. New players have encountered many problems when entering. What are the functions of the four NPCs on the homepage? Next, the editor will bring you a sharing list of the four homepage NPC functions in "Adventure Treasure Hunt and Defeat the Demon King". Adventure treasure hunting and then defeat the devil. Home page NPC functions. Introduce the functions of the 4 home page NPCs: 1. Adventure group: adventure group upgrade, season adventure group upgrade, upper limit upgrade of equipment (backpack). After an adventure, remember to clean up some waste equipment. Otherwise, it will occupy the grid space), and the upper limit of the number of characters will be upgraded (there are only 8 character slots initially, and golden characters can exceed them) 2. Trainer: Train characters (purple and gold can be trained), upgrade

How to set up the 360 browser homepage? 360 Browser is a very secure web browser software. This browser has rich functions and services. Many users like to use this browser for work. The homepage of 360 Browser is very rich in content, and many users are very interested in it. I like this homepage, and many users prefer a simpler homepage. So how do we set the homepage of 360 Browser? Next, the editor will introduce to you how to set up the 360 browser homepage. Come and take a look. Introduction to how to set up the home page of 360 Browser 1. First, you need to enter the main interface of 360 Secure Browser (as shown in the picture). 2. Click the "Three Stripes" option in the upper right corner, and then click the "Settings" option that appears in the drop-down menu to enter the settings interface.

How to design a Java switch grocery shopping system with a carousel function on the homepage. With the development of the Internet, people's lifestyles are also constantly changing. More and more people are choosing to shop online, including groceries. In order to meet the needs of users, many grocery shopping platforms have launched the function of online ordering of groceries. In these platforms, the home page carousel is one of the very important functions. This article will introduce how to design a Java switch grocery shopping system with a carousel function on the homepage. 1. Functional requirements analysis Before designing the home page carousel function, we need to analyze and understand

The answer to the Natural History Encyclopedia Si Ling Shui Yuan is a natural history encyclopedia answering challenge in the game. So what is the answer to this answering challenge? Today, the editor has sorted out the answers to the questions in this encyclopedia of natural history for everyone, and provided you with the detailed location of the answers, which can help you better complete the challenge of answering the questions in this encyclopedia of natural history. The detailed content can be found in this article Let’s take a look at the encyclopedia of natural history and the answers to all the questions and answers about Si Ling Shui Yuan. Guide to answering questions on the Condor Museum Encyclopedia Si Ling Shui Yuan 1. First come to the location as shown in the picture below. 2. You can find a character named Meng Gong here, and then have a conversation with him, and then you can participate in the Silingshuiyuan question answering challenge. 1. Who built Lingshui Yuan? Answer: [Wanyan Jing] 2. The show organizer was arrested again
