For example, use regular expressions to match the pinyin of shá.
ps: What I said before may not be clear. I used the word "for example", which means that there is pinyin in the text to be processed, but I don't know what the specific pinyin is. You need to find out these pinyin, and the text to be processed will be in Chinese. , pinyin, symbols (,.: and the like), so please do not answer questions such as re.search(u'shá',text)
It needs to be regular, not a simple fixed string. . .
Matching results:
['ís', 'à', 'pìnyin', 'abóut', 'shá']
The first Thǐs is not matched because the default pinyin is all lowercase, excluding uppercase.
Do you want to match all legal pinyin?
If so, you can find the pinyin index of a dictionary and put all the pinyin
|
together. It can only be like this, because Pinyin is not defined according to regular rules or some other mechanical rules. This is all you can do if you don’t miss anything and don’t have too many, and there aren’t many anyway.