Table of Contents
回复内容:
Home Backend Development Python Tutorial 如何使用 Python 抓取雪球网页?

如何使用 Python 抓取雪球网页?

Jun 06, 2016 pm 04:10 PM
beautifulsoup chrome python

我想使用beautifulsoup或者其他的python包  抓取  雪球网页上面的一些组合,因为雪球网的组合持仓变动的时候,雪球网不会给提示,比如说,我想抓取这个xueqiu.com/P/ZH010389。基本的想法是用程序追踪到他的持仓,然后有变化的时候,程序给我一个提示。

##简而言之,要做的事情是:打开这个界面,然后打开这个界面的调仓历史记录,然后记录下他的当前仓位,对比以前的仓位。##

问题是:由于我对HTML不太了解,我打开Chrome的开发者工具的时候,不知道应该怎么样让我的程序打开他的调仓历史记录。。。

这个问题可能比较小白。。。麻烦啦!!!

回复内容:

//好多人说现在关注就有提醒的……呃,题主提问的时候显然没那个功能。我写这个只是自己在学习爬虫过程中的练习。我不炒股也不上雪球……

//好多赞。容我安利一篇自己的回答如何入门 Python 爬虫? - 段晓晨的回答

边做边调边写~
#start coding
首先要知道自己在爬什么~楼主说找到HTML的代码云云,思路其实是错误的。因为我们想要的内容不在原始的html里面。但是肯定在浏览器和服务器之间的通信里,我们只要找到这部分数据就好。
#我用的是Firefox的FireBug
选择网络(Chrome中应该是Network),点击调仓历史记录,如图

19f2857c102321adcf48f05b1fb1e394_b.jpg

可以看到浏览器和服务器之间进行了一次通信。我们截获了一个网址。打开看看。

看上去像是一堆乱,但是细心的话就会发现……

3a16c3f9fee89c1baa2ed1466cdeda76_b.jpg
也就是说我们要的数据都在这里了,所以只要先获取这个页面的内容然后在提取数据就好了~

import urllib.request
url = 'http://xueqiu.com/cubes/rebalancing/history.json?cube_symbol=ZH010389&count=20&p
age=1'
req = urllib.request.Request(url,headers=headers)
html = urllib.request.urlopen(req).read().decode('utf-8')
print(html)
Copy after login

现在关注一个组合,就会有持仓变动的提示了。不过我觉得这事情挺有意思的。比如可以把很多持仓的数据都抓下来,做一些综合的分析,看看现在网站上被持有最多的股票是哪一支,某一天被调入最多的又是哪一支之类。

于是我决定来抓抓看,顺便借此说说我通常用程序做自动抓取的过程。

Step.1 分析页面

要抓一个网页,首先自然是要“研究”这个网页。通常我会用两种方式:

一个是 Chrome 的 Developer Tools。通过它里面的 Network 功能可以看到页面发出的所有网络请求,而大多数数据请求都会在 XHR 标签下。点击某一个请求,可以看到其具体信息,以及服务器的返回结果。很多网站在对于某些数据会有专门的请求接口,返回一组 json 或者 XML 格式的数据,供前台处理后显示。


另一个就是直接查看网页源代码。通常浏览器的右键菜单里都有这个功能。从页面的 HTML 源码里直接寻找你要的数据,分析它格式,为抓取做准备。

对于雪球上的一个组合页面 ,粗略地看了一下它发出的请求,并没有如预想那样直接找到某个数据接口。看源代码,发现有这样一段:

SNB.cubeInfo = {"id":10289,"name":"誓把老刀挑下位","symbol":"ZH010389" ...此处略过三千字... "created_date":"2014.11.25"}
SNB.cubePieData = [{"name":"汽车","weight":100,"color":"#537299"}];
Copy after login

雪球网已经改了很多规则,以前的很多代码估计都不能用了
我刚写了一个雪球网的模拟登录,fuck-login/012 xueqiu.com at master · xchaoinfo/fuck-login · GitHub
在此基础上修改,可以达到题主的目的,而且可以做到更加简单。
处理 cookies ,不需要每次都登录一次的方法,可以参考 fuck-login/001 zhihu at master · xchaoinfo/fuck-login · GitHub 的处理方法。 需要两个模块配合:

  • 爬虫模块:单纯负责抓取和存储数据

  • 数据处理模块:处理爬虫存储的数据。如发现某个人某个持仓数据发生了变化,向你发出通知


该爬虫的简单的流程:

  1. 定时访问目标页面

  2. 抓取当前目标页面的数据,存入数据库

数据处理模块简单的流程:

  1. 定时访问数据库

  2. 数据库中的数据满足某个条件时执行自己设定的动作

抓取雪球的数据?巧了,刚看到一篇文章专门讲这个的,推荐给大家:互联网金融爬虫怎么写已关注的组合会收到调仓通知。

#技术宅都好暴力,看不到调仓就直接抓......# 我在 @段晓晨的基础上做了一点点优化,目前是这样的。

测试前请把帐号密码填上

更新内容:
增加了自动获取cookie
修改了一下显示组合改变的代码

import urllib.requestimport jsonimport http.cookiejar#设置cookieCookieJar = http.cookiejar.CookieJar()CookieProcessor = urllib.request.HTTPCookieProcessor(CookieJar)opener = urllib.request.build_opener(CookieProcessor)urllib.request.install_opener(opener)#登陆获得cookieparams = urllib.parse.urlencode({'username':'*****','password':'*****'}).encode(encoding='UTF8')headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.2; WOW64; rv:38.0) Gecko/20100101 Firefox/38.0'}request = urllib.request.Request('http://xueqiu.com/user/login',headers=headers)httpf = opener.open(request, params)#获得组合url = 'http://xueqiu.com/cubes/rebalancing/history.json?cube_symbol=ZH340739&count=20&page=1'req = urllib.request.Request(url,headers=headers)html = urllib.request.urlopen(req).read().decode('utf-8')data = json.loads(html)stockdata = data['list'][0]['rebalancing_histories']for i in range(len(stockdata)):print('股票名称',end=':')print(stockdata[i]['stock_name'],end='   持仓变化')print(stockdata[i]['prev_weight'],end='-->')print(stockdata[i]['target_weight'])
Copy after login

首先需要三个库:urllib2,cookielib,json
然后用firefox 打开誓把老刀挑下位 并登陆,然后找到 cookie文件,
最后调仓记录的地址是:xueqiu.com/cubes/rebala   用urllib2 和coolielib 伪造header,和cookie 访问 就可以得到 json文件格式的调仓记录,然后用json 处理 就可以了 题主不知道关注后有推送提示么 ...... 用shell

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
2 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
Hello Kitty Island Adventure: How To Get Giant Seeds
1 months ago By 尊渡假赌尊渡假赌尊渡假赌
Two Point Museum: All Exhibits And Where To Find Them
1 months ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Is the conversion speed fast when converting XML to PDF on mobile phone? Is the conversion speed fast when converting XML to PDF on mobile phone? Apr 02, 2025 pm 10:09 PM

The speed of mobile XML to PDF depends on the following factors: the complexity of XML structure. Mobile hardware configuration conversion method (library, algorithm) code quality optimization methods (select efficient libraries, optimize algorithms, cache data, and utilize multi-threading). Overall, there is no absolute answer and it needs to be optimized according to the specific situation.

Is there any mobile app that can convert XML into PDF? Is there any mobile app that can convert XML into PDF? Apr 02, 2025 pm 08:54 PM

An application that converts XML directly to PDF cannot be found because they are two fundamentally different formats. XML is used to store data, while PDF is used to display documents. To complete the transformation, you can use programming languages ​​and libraries such as Python and ReportLab to parse XML data and generate PDF documents.

How to control the size of XML converted to images? How to control the size of XML converted to images? Apr 02, 2025 pm 07:24 PM

To generate images through XML, you need to use graph libraries (such as Pillow and JFreeChart) as bridges to generate images based on metadata (size, color) in XML. The key to controlling the size of the image is to adjust the values ​​of the <width> and <height> tags in XML. However, in practical applications, the complexity of XML structure, the fineness of graph drawing, the speed of image generation and memory consumption, and the selection of image formats all have an impact on the generated image size. Therefore, it is necessary to have a deep understanding of XML structure, proficient in the graphics library, and consider factors such as optimization algorithms and image format selection.

How to convert XML files to PDF on your phone? How to convert XML files to PDF on your phone? Apr 02, 2025 pm 10:12 PM

It is impossible to complete XML to PDF conversion directly on your phone with a single application. It is necessary to use cloud services, which can be achieved through two steps: 1. Convert XML to PDF in the cloud, 2. Access or download the converted PDF file on the mobile phone.

What is the function of C language sum? What is the function of C language sum? Apr 03, 2025 pm 02:21 PM

There is no built-in sum function in C language, so it needs to be written by yourself. Sum can be achieved by traversing the array and accumulating elements: Loop version: Sum is calculated using for loop and array length. Pointer version: Use pointers to point to array elements, and efficient summing is achieved through self-increment pointers. Dynamically allocate array version: Dynamically allocate arrays and manage memory yourself, ensuring that allocated memory is freed to prevent memory leaks.

How to open xml format How to open xml format Apr 02, 2025 pm 09:00 PM

Use most text editors to open XML files; if you need a more intuitive tree display, you can use an XML editor, such as Oxygen XML Editor or XMLSpy; if you process XML data in a program, you need to use a programming language (such as Python) and XML libraries (such as xml.etree.ElementTree) to parse.

What is the process of converting XML into images? What is the process of converting XML into images? Apr 02, 2025 pm 08:24 PM

To convert XML images, you need to determine the XML data structure first, then select a suitable graphical library (such as Python's matplotlib) and method, select a visualization strategy based on the data structure, consider the data volume and image format, perform batch processing or use efficient libraries, and finally save it as PNG, JPEG, or SVG according to the needs.

Recommended XML formatting tool Recommended XML formatting tool Apr 02, 2025 pm 09:03 PM

XML formatting tools can type code according to rules to improve readability and understanding. When selecting a tool, pay attention to customization capabilities, handling of special circumstances, performance and ease of use. Commonly used tool types include online tools, IDE plug-ins, and command-line tools.

See all articles