Regular expression - How to match Chinese Pinyin using Python?
ringa_lee
ringa_lee 2017-05-27 17:39:30
0
3
1761

For example, use regular expressions to match the pinyin of shá.
ps: What I said before may not be clear. I used the word "for example", which means that there is pinyin in the text to be processed, but I don't know what the specific pinyin is. You need to find out these pinyin, and the text to be processed will be in Chinese. , pinyin, symbols (,.: and the like), so please do not answer questions such as re.search(u'shá',text) It needs to be regular, not a simple fixed string. . .

ringa_lee
ringa_lee

ringa_lee

reply all(3)
巴扎黑
import re
regex = re.compile(r'\b[a-z]*[āáǎàōóǒòêēéěèīíǐìūúǔùǖǘǚǜüńňǹɑɡ]+[a-z]*\b')
text = "Thǐs ís à pìnyin abóut shá"
m = regex.findall(text)
print(m)

Matching results:
['ís', 'à', 'pìnyin', 'abóut', 'shá']
The first Thǐs is not matched because the default pinyin is all lowercase, excluding uppercase.

PHPzhong

Do you want to match all legal pinyin?

If so, you can find the pinyin index of a dictionary and put all the pinyin | together. It can only be like this, because Pinyin is not defined according to regular rules or some other mechanical rules. This is all you can do if you don’t miss anything and don’t have too many, and there aren’t many anyway.

伊谢尔伦
>>> import re
>>> d='shá'
>>> data='This is a pinyin about shá'
>>> re.search(d,data)
<_sre.SRE_Match at 0x404e308>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template