利用PyQuery获取HTML指定标签内容_html/css_WEB-ITnose

WBOY
Release: 2016-06-21 09:16:45
Original
1700 people have browsed it

安装

sudo pip install pyquery

例子

from pyquery import PyQueryimport urllib2page = urllib2.urlopen("http://www.lzu.edu.cn")text = unicode(page.read(), "utf-8")doc = PyQuery(text)for event in doc('.r li'):    event = PyQuery(event)    #loc = event.find('.h').text()    time = event.text().encode('utf-8')    #name = event.find('title').text()    #print 'name: %s' % name    print '名字 : %s' % time    #print 'location : %s' % loc    print '----------------------'
Copy after login

注意event里是unicode,在内存中运算的一定是固定2字节的unicode,存储要转为变字节的utf-8。

当然还有别的模块也可以用,如

#!/usr/bin/env python#-*- coding: utf8 -*-from HTMLParser import HTMLParserfrom htmlentitydefs import name2codepointimport urllib2class MyHTMLParser(HTMLParser):    def __init__(self):        HTMLParser.__init__(self)        self._flag = ''    def handle_starttag(self, tag, attrs):        if tag == 'h3' and attrs.__contains__(('class','event-title')):            self._flag = 'event-title'        if tag == 'time':            self._flag = 'time'        if tag == 'span' and attrs.__contains__(('class','event-location')):            self._flag = 'event-location'    def handle_data(self, data):        if self._flag == 'event-title':            print '会议名称: %s' %data            self._flag = ''        #if self._flag == 'time':        #   print '会议时间: %s' %data        if self._flag == 'event-location':            print '会议地点: %s' %data            print '-------------------'            self._flag = ''page = urllib2.urlopen('https://www.python.org/events/python-events/').read()parser = MyHTMLParser()parser.feed(page)
Copy after login

References

[1].http://www.douban.com/note/208670234/

[2].http://blog.csdn.net/mindmb/article/details/7898528

[3].http://pythonhosted.org/pyquery/api.html

source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template