84669 person learning
152542 person learning
20005 person learning
5487 person learning
7821 person learning
359900 person learning
3350 person learning
180660 person learning
48569 person learning
18603 person learning
40936 person learning
1549 person learning
1183 person learning
32909 person learning
ringa_lee
So do you really use pandas as a tool for reading data?.
Added a column is_tobacco as the mark you said
filter_query returns a list containing these words, and the efficiency has been improved
Secondly, you can split it and use multiprocessing to execute it. This will speed up the process by more than a little
import pandas as pd word = pd.read_table('test.txt', encoding = 'utf-8', names = ['query']) def signquery(word): tobacco = [u'烟', u'白沙', u'黄金叶', u'利群', u'南京九五', u'黄鹤楼软', u'黄鹤楼硬', u'娇子', u'钻石荷花', u'玉溪', u'七匹狼尚品', u'七匹狼软灰'] word['is_tobacco'] = word['query'].apply(lambda name:name in tobacco) return word def filter_query(word): tobacco = [u'烟', u'白沙', u'黄金叶', u'利群', u'南京九五', u'黄鹤楼软', u'黄鹤楼硬', u'娇子', u'钻石荷花', u'玉溪', u'七匹狼尚品', u'七匹狼软灰'] return word[word['query'].apply(lambda name:name in tobacco)]['query'].to_dict().values() result = filter_query(word) print result
You can try using regular expressions:
import re pattern = re.compile(u'烟|白沙|黄金叶|利群|南京九五|黄鹤楼软|黄鹤楼硬|娇子|钻石荷花|玉溪|七匹狼尚品|七匹狼软灰') result = filter(pattern.search, word['query'])
KMP algorithm
KMPManacherTireTree
KMP
Manacher
TireTree
So do you really use pandas as a tool for reading data?.
Added a column is_tobacco as the mark you said
filter_query returns a list containing these words, and the efficiency has been improved
Secondly, you can split it and use multiprocessing to execute it. This will speed up the process by more than a little
You can try using regular expressions:
KMP algorithm
KMP
Manacher
TireTree