xpath - python怎么用lxml处理
伊谢尔伦
伊谢尔伦 2017-04-18 10:19:05
0
3
1202

例如:

<p>
没
<em><!--red_beg-->aa<!--red_end--></em>
</p>
<p>
没
<em><!--red_beg-->aa<!--red_end--></em>
没
<em><!--red_beg-->aa<!--red_end--></em>
</p>
<p>
没
</p>

就是在p标签里可能会出现同样的em标签,而且数量不定,那我怎么获取p的内容,包括em里的内容。
例如第二个p获取输出是‘没aa没aa’

或者获取到p节点之后,怎么把里面的内容转换为字符串

伊谢尔伦
伊谢尔伦

小伙看你根骨奇佳,潜力无限,来学PHP伐。

reply all(3)
PHPzhong

I accidentally learned how to deal with this problem today, so I came up with this question to answer it. Questioner, you can look at the axis of xpath, for example, if you want to get the second <p>标签的“没aa没aa”,实际是取得它全部后代节点的文本内容,可以使用
element_dom.xpath("//p[2]//descendant::text()")来取得,拿到的结果是一个这样['没', 'aa', '没', 'aa']的list,然后自己手动拼接成字符串就可以了,比如"".join(list). Similarly, if you need to perform other operations, you can also use similar methods.

大家讲道理

Change to bs4, the similarities and differences between string and text are here.

洪涛

The

method of .text_content()lxml.html can get the text content of the current node and all child nodes.

Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template