python抓取新浪微博,求教!!?
python抓取新浪微博,被挡,用了代理,有10个帐号,10个代理,爬的很慢,大家有什么好的办法,谢谢!!!
回复内容:
http://github.com/zhu327/rss 既然你也用python就直接看代码吧爬这里 http://service.weibo.com/widget/widget_blog.php?uid={uid} 替换uid,无需登录,不会被挡 爬手机端
http://weibo.cn
可以参考下面的代码,来自极客学院,侵删
#-*-coding:utf8-*- import smtplib from email.mime.text import MIMEText import requests from lxml import etree import os import time import sys reload(sys) sys.setdefaultencoding('utf-8') class mailhelper(object): ''' 这个类实现发送邮件的功能 ''' def __init__(self): self.mail_host="smtp.xxxx.com" #设置服务器 self.mail_user="xxxx" #用户名 self.mail_pass="xxxx" #密码 self.mail_postfix="xxxx.com" #发件箱的后缀 def send_mail(self,to_list,sub,content): me="xxoohelper"+"<"+self.mail_user+"@"+self.mail_postfix+">" msg = MIMEText(content,_subtype='plain',_charset='utf-8') msg['Subject'] = sub msg['From'] = me msg['To'] = ";".join(to_list) try: server = smtplib.SMTP() server.connect(self.mail_host) server.login(self.mail_user,self.mail_pass) server.sendmail(me, to_list, msg.as_string()) server.close() return True except Exception, e: print str(e) return False class xxoohelper(object): ''' 这个类实现将爬取微博第一条内容 ''' def __init__(self): self.url = 'http://weibo.cn/u/xxxxxxx' #请输入准备抓取的微博地址 self.url_login = 'https://login.weibo.cn/login/' self.new_url = self.url_login def getSource(self): html = requests.get(self.url).content return html def getData(self,html): selector = etree.HTML(html) password = selector.xpath('//input[@type="password"]/@name')[0] vk = selector.xpath('//input[@name="vk"]/@value')[0] action = selector.xpath('//form[@method="post"]/@action')[0] self.new_url = self.url_login + action data = { 'mobile' : 'xxxxx@xxx.com', password : 'xxxxxx', 'remember' : 'on', 'backURL' : 'http://weibo.cn/u/xxxxxx', #此处请修改为微博地址 'backTitle' : u'微博', 'tryCount' : '', 'vk' : vk, 'submit' : u'登录' } return data def getContent(self,data): newhtml = requests.post(self.new_url,data=data).content new_selector = etree.HTML(newhtml) content = new_selector.xpath('//span[@class="ctt"]') newcontent = unicode(content[2].xpath('string(.)')).replace('http://','') sendtime = new_selector.xpath('//span[@class="ct"]/text()')[0] sendtext = newcontent + sendtime return sendtext def tosave(self,text): f= open('weibo.txt','a') f.write(text + '\n') f.close() def tocheck(self,data): if not os.path.exists('weibo.txt'): return True else: f = open('weibo.txt', 'r') existweibo = f.readlines() if data + '\n' in existweibo: return False else: return True if __name__ == '__main__': mailto_list=['xxxxx@qq.com'] #此处填写接收邮件的邮箱 helper = xxoohelper() while True: source = helper.getSource() data = helper.getData(source) content = helper.getContent(data) if helper.tocheck(content): if mailhelper().send_mail(mailto_list,u"女神更新啦",content): print u"发送成功" else: print u"发送失败" helper.tosave(content) print content else: print u'pass' time.sleep(30)
爬他的移动端页面,当时限制比网页端少。
爬虫程序部署在google app engine多个节点上跑 新浪有开发者平台,有专门的API接口,用爬虫会被屏蔽

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics

The speed of mobile XML to PDF depends on the following factors: the complexity of XML structure. Mobile hardware configuration conversion method (library, algorithm) code quality optimization methods (select efficient libraries, optimize algorithms, cache data, and utilize multi-threading). Overall, there is no absolute answer and it needs to be optimized according to the specific situation.

An application that converts XML directly to PDF cannot be found because they are two fundamentally different formats. XML is used to store data, while PDF is used to display documents. To complete the transformation, you can use programming languages and libraries such as Python and ReportLab to parse XML data and generate PDF documents.

To generate images through XML, you need to use graph libraries (such as Pillow and JFreeChart) as bridges to generate images based on metadata (size, color) in XML. The key to controlling the size of the image is to adjust the values of the <width> and <height> tags in XML. However, in practical applications, the complexity of XML structure, the fineness of graph drawing, the speed of image generation and memory consumption, and the selection of image formats all have an impact on the generated image size. Therefore, it is necessary to have a deep understanding of XML structure, proficient in the graphics library, and consider factors such as optimization algorithms and image format selection.

It is impossible to complete XML to PDF conversion directly on your phone with a single application. It is necessary to use cloud services, which can be achieved through two steps: 1. Convert XML to PDF in the cloud, 2. Access or download the converted PDF file on the mobile phone.

Use most text editors to open XML files; if you need a more intuitive tree display, you can use an XML editor, such as Oxygen XML Editor or XMLSpy; if you process XML data in a program, you need to use a programming language (such as Python) and XML libraries (such as xml.etree.ElementTree) to parse.

To convert XML images, you need to determine the XML data structure first, then select a suitable graphical library (such as Python's matplotlib) and method, select a visualization strategy based on the data structure, consider the data volume and image format, perform batch processing or use efficient libraries, and finally save it as PNG, JPEG, or SVG according to the needs.

XML formatting tools can type code according to rules to improve readability and understanding. When selecting a tool, pay attention to customization capabilities, handling of special circumstances, performance and ease of use. Commonly used tool types include online tools, IDE plug-ins, and command-line tools.

There is no built-in sum function in C language, so it needs to be written by yourself. Sum can be achieved by traversing the array and accumulating elements: Loop version: Sum is calculated using for loop and array length. Pointer version: Use pointers to point to array elements, and efficient summing is achieved through self-increment pointers. Dynamically allocate array version: Dynamically allocate arrays and manage memory yourself, ensuring that allocated memory is freed to prevent memory leaks.
