html - python提取标签中的内容

Question

抓取了一个网页，网页中一部分内容如下： 我使用如下代码： {代码...} 但是只能输出：奥迪阿萨德，第一个之后的内容都不能输出，请问该如何解决？

黄舟 · Answer

lxml's element.text returns the content of the first node of this element, so this problem will occur. You can use the getText helper method to solve this problem:

# require lxml
# version: python2
def getText(elem):
    rc = []
    for node in elem.itertext():
        rc.append(node.strip())
    return ''.join(rc)

You can directly modify the last line here:

import codecs
#coding=utf-8
from lxml import etree

def getText(elem):
    rc = []
    for node in elem.itertext():
        rc.append(node.strip())
    return ''.join(rc)

f=codecs.open("1.html","r","utf-8")
content=f.read()
f.close()
tree=etree.HTML(content)
# 返回的是lxml.etree._Element,可以直接作为getText参数来调用。
node=tree.xpath("//p[@class='content']")[0]
print getText(node).encoding('gbk')

The getText here is just a simple implementation. For example, the following xml text will print abdc, which should meet your requirements.


    ab dc

巴扎黑 · Answer

#!/usr/bin/env python3
from bs4 import BeautifulSoup

f = open("1.html", "r")
html = BeautifulSoup( f.read() )
node = html.select(".content")[0]
print( node.prettify() )

html.select(".content")This may need more selectors to qualify. In addition, I just roughly wrote how BeautifulSoup works. For specific needs, you can check the manual: Beautiful Soup Document

Php8, I'm coming too

Learn website layout in 30 minutes

Shangguan Oracle Beginner to Proficient Video Tutorial

Your first line of UNI-APP code

Flutter from scratch to app launch

Brother Lian New Linux Video Tutorial

AXURE 9 Video Tutorial (Suitable for Product Manager Interactive Product Design UI)

Zero Basic Proficiency PS Video Tutorial

16 day UI video tutorial to get you started

PS Techniques and Slicing Techniques Video Tutorial

Alibaba Cloud Environment Construction and Project Launch Video Tutorial

Overview of Computer Networks - Basic Knowledge that Programmers Must Master

Essential Tutorial for Programmers - HTTP Protocol Explanation

Websocket Video Tutorial