beautifulsoup - 【答疑】python下如何把unicode编码的数据转为utf-8的?
PHP中文网
PHP中文网 2017-04-17 17:29:30
0
3
503

如题 现在有一个<class 'bs4.element.NavigableString'>type的数据
打印出来是这个样的
[u'3788.00', u'4788.00', u'6388.00', u'2398.00', u'5687.00', u'4088.00', u'4187.00', u'4087.00', u'2587.00', u'5188.00', u'4887.00', u'4287.00', u'4887.00', u'5787.00', u'4887.00', u'4888.00', u'\u8d27\u5230\u4ed8\u6b3e', u'6388.00', u'4987.00', u'5588.00', u'5588.00', u'5588.00', u'3288.00', u'3888.00', u'4788.00', u'4788.00', u'4788.00', u'4788.00', u'5588.00', u'4088.00', u'4788.00', u'4788.00', u'5588.00', u'5588.00', u'6388.00', u'6388.00', u'4788.00', u'5588.00', u'4988.00', u'4788.00', u'6388.00', u'6388.00', u'6388.00', u'5588.00', u'5588.00', u'5588.00', u'6388.00', u'5588.00', u'5588.00', u'4788.00', u'6388.00', u'6388.00', u'6388.00', u'5588.00', u'5588.00', u'6588.00', u'6588.00', u'5588.00', u'5588.00', u'5788.00']

当我用int()类型转换时 提示我:
ValueError: invalid literal for int() with base 10: '3788.00'

然后就在网上看到有网友说用 round(float(Price))的方法可行 #Price就是那个'class 'bs4.element.NavigableString'类型的数据

但是提示的是:
UnicodeEncodeError: 'decimal' codec can't encode characters in position 0-3: invalid decimal Unicode string

这种情况下如何解决呢? BTW我是想用list.append方法把上面这个列表添加到其他列表的时候出现的报错(可是明明昨晚还能运行的T-T)

PHP中文网
PHP中文网

认证高级PHP讲师

reply all(3)
小葫芦

There is no way to convert it to a floating point number using float是可以的,只是有一个u'u8d27u5230u4ed8u6b3e'(货到付款). Just delete this element or ignore it when processing.

巴扎黑

Add encode('utf-8') after the data you want to output

大家讲道理

First of all, the data type you are dealing with is <class 'bs4.element.NavigableString'>type
This is NavigableString type data in html read with BeautifulSoup.

In fact, when reading with BS4, you need to use encoding to adjust the data in the html to utf-8

Example:

soup = BeautifulSoup(html.read().decode("utf-8"), "html.parser")

Then the NavigableString type data displayed in unicode tags above will be displayed normally.

Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template