In the previous article, we have collected the list data of the news information page. The next step is to read the URL that needs to be collected from the database and crawl the page
Create a new content table
However, one thing to note is that you can no longer use the incrementing method of collecting URLs, because there may be id discontinuities in the data table, such as id=9, id=11. When the id=10 is collected, Sometimes, the URL is blank, which may result in empty fields being collected.
One of the techniques used here is the query statement of the database. When we collect the first piece of data, we determine whether there is an ID number greater than this ID in the database. If so, read one and repeat the query information above. work.
The specific code is as follows:
<?php include_once("conn.php"); $id=(int)$_GET['id']; $sql="select * from list where id=$id"; $result=mysql_query($sql); $row=mysql_fetch_array($result);//取得对应的url地址 $content=file_get_contents($row['url']); $pattern="/<dd class=\"dataWrap\">(.*)<\/dd>/iUs"; preg_match($pattern, $content,$info);//获取内容存放info echo $title=$row[1]."<br/>"; echo $content=$info[0]."<hr/>"; //插入数据库 $add="insert into content(title,content) value('$title','$content')"; mysql_query($add); $sql2="select * from list where id>$id order by id asc limit 1"; $result2=mysql_query($sql2); $row2=mysql_fetch_array($result2);//取得对应的url地址 if($row2['id']){ echo "<script>window.location='content.php?id=$row2[0]'</script>"; } ?>
In this way, the news content we want has been collected and stored in the database. Next, we only need to organize some styles of the data.
There is $nr = implode('#',$arr) method in php, that's it
But the above composition is "Content 1# Content 2" without the last #, if necessary
That’s $nr = implode('#',$arr).'#'
The stupid way is to use
foreach( $arr as $vl){
$nr .=$vl."#";
}
Reference: $
mysql_connect() //Connect to your database first
mysql_select_db() //Select your database
mysql_query("insert into your table (address, title) values ('$tmp[1][ $i]',$tmp[2][$i])");//OK, done!