python - 如何爬取带有日期选择的ajax网站?
伊谢尔伦
伊谢尔伦 2017-04-18 10:19:32
0
4
1242

需要爬取三峡水库的实时水情数据,可以在网页中选择日期显示水情信息,如果一天天选择再复制数据发现很是耗时,我现在需要将下图中三峡水利枢纽2014年-2016年每天的数据爬下来。

网址如下:
http://www.ctgpc.com.cn/sxjt/...

通过浏览器自带的检查工具,右键检查元素,查看 network,查看调用的 ajax API 地址:初步分析后发现是通过ajax调用了以下网址,并用POST传递了一个日期数据,例如今天2017-02-15给该网址:
http://www.ctgpc.com.cn/eport...

Header如下:

Response如下:

之前有搜索到类似的问题:https://segmentfault.com/q/10...
但是按照回答并没能解决我的疑惑,因此在这里求助各位前辈,麻烦大家了

伊谢尔伦
伊谢尔伦

小伙看你根骨奇佳,潜力无限,来学PHP伐。

reply all(4)
伊谢尔伦

You can use the requests library to simulate post submission. From the browser inspection tool, you can see that the passed parameter is time:2017-02-07. Define data={"time": date such as 2017-02-07}. Then you can write a loop that loops through the date and adds one day to it. Then r = requests.post("url", data=data, header=****). Take out the data and save it into the database. If each cycle is too slow, you can add gevent, a coroutine library, to speed up the process. If you want to capture 2 years of data, cycle 365*2 times and it will be OK

伊谢尔伦

You’ve seen that request with data, so what’s your question?

迷茫

Capture the packet and then simulate post or get
Look at the content below
Python crawler association word video and code
https://zhuanlan.zhihu.com/p/...

Learn Python crawler to capture proxy IP and verification from Brother Huang.
https://zhuanlan.zhihu.com/p/...
Learn Python crawler to capture proxy IP from Huang Ge
https://zhuanlan.zhihu.com/p/...

洪涛

Already got the Json string, it’s easier to get the data

Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template