scrapy - Python如何将Unicode转换为HTML
PHPz
PHPz 2017-04-17 17:43:17
0
1
461

现在我爬虫得到的数据格式为Unicode的html片段,现在想把它转换为html格式的内容,然后容易提取,应该如何做?
比如获取的片段为以下:

<p class="item"><p class="blk">
<a target="_blank" href="/topic/19564209">
<img src="https://pic3.zhimg.com/d3f7f95975ae3ff5cfeedad9a4febe56_xs.jpg" alt="游戏界面设计">
<strong>游戏  界面设计</strong>
</a>
<p></p>

<a id="t::-4657" href="javascript:;" class="follow meta-item zg-follow"><i class="z-icon-follow"></i>关注</a>

</p></p>

格式为unicode
如何将它转为html格式,然后进行提取?

PHPz
PHPz

学习是最好的投资!

reply all(1)
伊谢尔伦

You may have confused some concepts...(。・`ω´・)

  • HTML’s full name is How To Make Love, oh... no, it’s HyperText Markup Language (HyperText Markup Language)

  • Unicode is a string encoding, and strings also include GBK, GB2312, etc.

The two are not the same thing and cannot be converted into each other. Like I can say, can I convert python to unicode? Obviously you can't say that. You can only say that you convert the default encoding of python code into unicode.

If you need to convert python strings into unicode, the Unicode class is provided in py2. In py3, the default encoding of strings is unicode, and no conversion is required.

But your title description is to convert it into HTML format, so the paragraph you posted now is HTML...ヾ(o◕∀◕)ノ

Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template