python - How to regularize all Chinese characters in a string

Question

{Code...} The string is as above, the type is 'str', and the Chinese characters must be obtained by regularity. When I used [u4e00-u9fa5] before, I still got a list of symbols and numbers in English. Please teach me the correct posture. Also, tell me where I made a mistake... {code...} This is what I wrote... but there are no Chinese characters in the returned result...

習慣沉默 · Answer

I assume here that the text you need to match is s:

pattern = re.compile(ur"[\u4e00-\u9fa5]")
print pattern.findall(s.decode('utf8'))

The decode('utf8') here is because the value of s is a Unicode hash like x66x77x88. In addition, you need to pay attention to the ur modifier in compile(), and u is the Unicode modifier.

PS: I was inspired by this article.

Update

I just read what was said downstairs. It is true that with Python 3, the output is Unicode hash. The following is excerpted from here

Unicode string

In Python2, ordinary strings are stored as 8-bit ASCII codes, while Unicode strings are stored as 16-bit unicode strings, which can represent more character sets. The syntax used is to prefix the string with u.

In Python3, all strings are Unicode strings.

女神的闺蜜爱上我 · Answer

You are using python2, uxxxx is a unicode character, and what you get after matching is a bytestring, which prints out each byte value.

Change to python3 This problem will disappear