Home Backend Development PHP Tutorial 采集练习(一) php 获得全国的小学(数据来自腾讯朋友网)

采集练习(一) php 获得全国的小学(数据来自腾讯朋友网)

Jun 23, 2016 pm 02:37 PM

  

    注:发现腾讯朋友网已经改版,部分参数需要自己获得修改 !!!
Copy after login

  年前有个需求获得某省的小学数据,分析了下朋友网的小学学校发现可以获得相关数据。

如获得 湖南省郴州市宜章县的全部小学

发现网页请求的地址是

http://api.pengyou.com/json.php?cb=__i_3&mod=school&act=selector&schooltype=6&country=0&province=43&district=431022&g_tk=1964222334

这里返回的是一个json

document.domain = "pengyou.com"; __i_3({"code":0,"subcode":0,"......});
Copy after login

解析后发现里面是 宜章县的全部小学。。。

分析了下参数

schooltype=6 表示小学

country = 0 表示 中国

province = 43 表示湖南省

district = 431022 表示宜章县

g_tk = 1964222334 不清楚 估计是随机数

有了这几个参数 就可以自己获得相应的 小学了。。

获得 湖南省郴州市 的所有县: http://api.pengyou.com/json.php?cb=__i_6&mod=getdistrict&cityid=4310&district_obj_name=_distinct&g_tk=271354436

要获得 学校必须获得province 、district 的值 但我没发现相应的网络请求获得相应的 province 、district 于是到页面上查找 发现 province 的值来自

http://cn.qzonestyle.gtimg.cn/campus/js/locations.js

需要解决的问题:

1、 获得locations.js 里的 省份 城市 id 值 时 需要 用到正则表达式

2、 根据 市 id 获得县 id

3、file_get_contents 获得 相关学校时 需要带上 相应的 user_agent 并配置 否则获不到数据。

以下是相应的代码

header("Content-type:text/html; charset=utf-8");set_time_limit(0);$js_data = @file_get_contents("locations.js");preg_match_all("/;location_array\[([0-9]{2})?\]='([^']+)?'/",$js_data,$locations);$datas = array();if(array_filter($locations[1]) && array_filter($locations[2])){    foreach($locations[1] as $key => $val){        preg_match_all("/;sublocation_array\[".$val."\]\[([0-9]{4,})\]='([^']+)?'/", $js_data, $matches);        $datas[$val]['name']= $locations[2][$key];        foreach($matches[1] as $k =>$v){            $datas[$val]['sub'][$v] = $matches[2][$k];        }    } }function getDatas($url){    $getPageSetting = array(        'http' => array(           'timeout' => 5,           'method' => 'GET',           'protocol_version'=>'1.1',           'header' =>                    "User-Agent: Mozilla/5.0 (iPhone; U; CPU iPhone OS 4_0 like Mac OS X; en-us) AppleWebKit/532.9 (KHTML, like Gecko) Version/4.0.5 Mobile/8A293 Safari/6531.22.7\r\n" .                    //"Referer: http://......php\r\n".浏览器访问过的,上一个页面的整个url地址字符串,直接在地址栏输入url访问此页面则没有此项                    "Host: isdspeed.qq.com\r\n" .//这项可以省略,如果这里设置错误会报错:failed to open stream: HTTP request failed!                     "Accept-Language: zh-cn,zh;q=0.5\r\n" .                    "Accept-Encoding: gzip, deflate\r\n" .                    "Accept-Charset: GBK,utf-8;q=0.7,*;q=0.3\r\n" .                    "Content-Type:application/x-www-form-urlencoded".                    "Accept: text/javascript, application/javascript, */*\r\n" .                    "Connection: keep-alive\r\n\r\n"        )    );    //$getHtml= file_get_contents($url, FALSE, stream_context_create($getPageSetting));   // 发现腾讯朋友网已经改版 所以直接用 file_get_contents 获得    $getHtml = file_get_contents($url); return $getHtml;    }/** * 创建文件夹 * @param string $path 文件夹路径 */function createFolder($path){    if (!file_exists($path)) {        createFolder(dirname($path));        mkdir($path, 0777);    }}$areas = array();// 获得相关省市县的小学foreach ($datas as $pid=>$rows){    foreach($rows as $k=>$v){        if($k =='sub'){            foreach($v as $cid =>$city){                $cityUrl = "http://api.pengyou.com/json.php?mod=getdistrict&cityid=".$cid."&district_obj_name=_distinct&g_tk=1523170442";                $result = getDatas($cityUrl);                $districtIds = json_decode($result,true);                $areas[$pid][$cid] = $districtIds['result']['district_arr'];                $district_arr= $districtIds['result']['district_arr'];                foreach($district_arr as $did =>$district){                    $url = "http://api.pengyou.com/json.php?&mod=school&act=selector&schooltype=6&country=0&province=".$pid."&district=".$did."&g_tk=1523170442";                    $schools = getDatas($url);                    $schools = json_decode($schools,true);                    $school_data = str_replace("&middot;","\r\n",strip_tags($schools['result']));                    $dirs = "school/".iconv('utf-8', 'gbk', $rows['name'])."/".iconv('utf-8', 'gbk', $city);                    createFolder($dirs);                    @file_put_contents($dirs.'/'.iconv('utf-8', 'gbk', $district).'.txt', $school_data);                }            }                    }            }}echo '<pre class="brush:php;toolbar:false">';print_r($areas);
Copy after login

 

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot Article Tags

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

11 Best PHP URL Shortener Scripts (Free and Premium) 11 Best PHP URL Shortener Scripts (Free and Premium) Mar 03, 2025 am 10:49 AM

11 Best PHP URL Shortener Scripts (Free and Premium)

Working with Flash Session Data in Laravel Working with Flash Session Data in Laravel Mar 12, 2025 pm 05:08 PM

Working with Flash Session Data in Laravel

Introduction to the Instagram API Introduction to the Instagram API Mar 02, 2025 am 09:32 AM

Introduction to the Instagram API

Simplified HTTP Response Mocking in Laravel Tests Simplified HTTP Response Mocking in Laravel Tests Mar 12, 2025 pm 05:09 PM

Simplified HTTP Response Mocking in Laravel Tests

Build a React App With a Laravel Back End: Part 2, React Build a React App With a Laravel Back End: Part 2, React Mar 04, 2025 am 09:33 AM

Build a React App With a Laravel Back End: Part 2, React

cURL in PHP: How to Use the PHP cURL Extension in REST APIs cURL in PHP: How to Use the PHP cURL Extension in REST APIs Mar 14, 2025 am 11:42 AM

cURL in PHP: How to Use the PHP cURL Extension in REST APIs

12 Best PHP Chat Scripts on CodeCanyon 12 Best PHP Chat Scripts on CodeCanyon Mar 13, 2025 pm 12:08 PM

12 Best PHP Chat Scripts on CodeCanyon

Notifications in Laravel Notifications in Laravel Mar 04, 2025 am 09:22 AM

Notifications in Laravel

See all articles