这样算吗?121238asdf<img class="BDE_Image" src="https://imgsa.baidu.com/forum/w%3D580/sign=da0493cd90ef76c6d0d2fb23ad14fdf6/e483aa4bd11373f0bddb2e73a40f4bfbf9ed04b1.jpg" width="560" height="420">
The string is as above, the type is 'str', and Chinese characters must be obtained by regularity. When I used [u4e00-u9fa5] before, I still got a list of symbols and numbers in English. Please teach me the correct posture. Also, tell me where I made a mistake...
pattern = re.compile(r'[\u4E00-\u9FA5]')
print pattern.findall(x[1])
This is what I wrote...but the returned result does not have Chinese characters, but other characters except Chinese characters.
I assume here that the text you need to match is
s
:The
decode('utf8')
here is because the value of s is a Unicode hash likex66x77x88
. In addition, you need to pay attention to theur
modifier incompile()
, andu
is the Unicode modifier.PS: I was inspired by this article.
Update
I just read what was said downstairs. It is true that with Python 3, the output is Unicode hash. The following is excerpted from here
You are using python2,
uxxxx
is a unicode character, and what you get after matching is abyte
string, which prints out each byte value.Change to python
3
This problem will disappear