目錄
新浪科技文章采集代码
程式碼
新浪科技的文章一键采集ThinkPhp适用代码
/* 新浪科技文章采集 */
public function sina_tech() {
/* NEED CAULL PAGE NUM */
$page_num = intval($_POST['get_post_page_num']);
if (empty($page_num)) $page_num = 1;
/* FIRST COUNT */
$post_count_a = M('post')->count();
/* FOR CULL */
for ($page = 1; $page
$fullpage = CurlGetPage('http://roll.tech.sina.com.cn/s/channel.php?ch=05#col=30&spec=&type=&ch=05&k=&offset_page=0&offset_num=0&num=5&asc=&page='.$page);
preg_match_all('/
\s+(.*)\s+/Us', $fullpage, $match);
$fullpage = iconv("GB2312", "UTF-8", $match[1][0]);//echo $data1;die;
preg_match_all('/(.*)/isU', $fullpage, $in_li_tags);
foreach (array_unique($in_li_tags[1]) as $row) {
/* TITLE */
preg_match_all('/(.*)/', $row, $title);
$title = $title[1][0];
/* LINK */
preg_match_all('/href="([^"]*)"/', $row, $link);
$link = $link[1][0];
/* DATE */
preg_match_all('/(.*)/i', $row, $date);
$date = date("Y-", time()) . $date[1][0] . ':00';
// echo $title.' '.$link.' '.$date.'
';
/* GOING THE POST PAGE */
$fullpage_post = CurlGetPage($link);
/* FIX TAGS */
$fullpage_post = preg_replace('/
$fullpage = iconv("GB2312", "UTF-8", $match[1][0]);//echo $data1;die;
preg_match_all('/
foreach (array_unique($in_li_tags[1]) as $row) {
/* TITLE */
preg_match_all('/(.*)/', $row, $title);
$title = $title[1][0];
/* LINK */
preg_match_all('/href="([^"]*)"/', $row, $link);
$link = $link[1][0];
/* DATE */
preg_match_all('/(.*)/i', $row, $date);
$date = date("Y-", time()) . $date[1][0] . ':00';
// echo $title.' '.$link.' '.$date.'
';
/* GOING THE POST PAGE */
$fullpage_post = CurlGetPage($link);
/* FIX TAGS */
$fullpage_post = preg_replace('/
(.*)/isU', '${1}', $fullpage_post);
$fullpage_post = preg_replace('/
$fullpage_post = preg_replace('/
(.*)/Us', '', $fullpage_post);
//echo htmlspecialchars($fullpage_post);die;
/* POST CONTENT */
preg_match_all('/
//echo htmlspecialchars($fullpage_post);die;
/* POST CONTENT */
preg_match_all('/
\s+(.*)\s+/Us', $fullpage_post, $post_content);
/* DEL A TAGS */
$post_content = preg_replace("/]*>(.*)/isU", '${1}', $post_content[1][0]);
// echo '
'.$date.'
'.$postCon.'
';
/* SAVE TO DB */
$post_title_count = M('post')->where("title='$title'")->count();
if ($post_title_count == 0) {
$dataMySql["title"] = $title;
$dataMySql["content"] = $post_content;
$dataMySql["datetime"] = $date;
M('post')->add($dataMySql);
}
}
}
/* LAST COUNT */
$post_count_b = M('post')->count();
$post_add_num = $post_count_b - $post_count_a;
/* CALLBACK */
if ($post_count_a == $post_count_b) {
echo '{"success":1,"msg":"文章数无变化"}';
} else {
echo '{"success":1,"msg":"成功采集 ' . $post_add_num . ' 篇文章"}';
}
}
/* DEL A TAGS */
$post_content = preg_replace("/]*>(.*)/isU", '${1}', $post_content[1][0]);
// echo '
'.$title.'
'.$url.''.$date.'
'.$postCon.'
';
/* SAVE TO DB */
$post_title_count = M('post')->where("title='$title'")->count();
if ($post_title_count == 0) {
$dataMySql["title"] = $title;
$dataMySql["content"] = $post_content;
$dataMySql["datetime"] = $date;
M('post')->add($dataMySql);
}
}
}
/* LAST COUNT */
$post_count_b = M('post')->count();
$post_add_num = $post_count_b - $post_count_a;
/* CALLBACK */
if ($post_count_a == $post_count_b) {
echo '{"success":1,"msg":"文章数无变化"}';
} else {
echo '{"success":1,"msg":"成功采集 ' . $post_add_num . ' 篇文章"}';
}
}
AD:真正免费,域名+虚机+企业邮箱=0元
本網站聲明
本文內容由網友自願投稿,版權歸原作者所有。本站不承擔相應的法律責任。如發現涉嫌抄襲或侵權的內容,請聯絡admin@php.cn

熱AI工具

Undresser.AI Undress
人工智慧驅動的應用程序,用於創建逼真的裸體照片

AI Clothes Remover
用於從照片中去除衣服的線上人工智慧工具。

Undress AI Tool
免費脫衣圖片

Clothoff.io
AI脫衣器

AI Hentai Generator
免費產生 AI 無盡。

熱門文章
R.E.P.O.能量晶體解釋及其做什麼(黃色晶體)
2 週前
By 尊渡假赌尊渡假赌尊渡假赌
倉庫:如何復興隊友
4 週前
By 尊渡假赌尊渡假赌尊渡假赌
Hello Kitty Island冒險:如何獲得巨型種子
3 週前
By 尊渡假赌尊渡假赌尊渡假赌
擊敗分裂小說需要多長時間?
3 週前
By DDD
R.E.P.O.保存文件位置:在哪里以及如何保護它?
3 週前
By DDD

熱工具

記事本++7.3.1
好用且免費的程式碼編輯器

SublimeText3漢化版
中文版,非常好用

禪工作室 13.0.1
強大的PHP整合開發環境

Dreamweaver CS6
視覺化網頁開發工具

SublimeText3 Mac版
神級程式碼編輯軟體(SublimeText3)

解決C++程式碼中出現的「error: expected initializer before 'datatype'」問題
