开发环境:以Python2.x为主 IPython
第三天 : 03day
如何统计序列元素的出现频度
解决方案:使用collections.Counter对象
将序列传入Counter的构造器,得到Counter对象是元素频度的字典
Counter.most_common(n)方法得到频度最高的n个元素的列表
1.某随机序列[12,5,6,6,5,5,7...] 中找到出现次数最高的3个元素,它们出现次数是多少?
In [41]: from random import randint
In [42]: data = [randint(0,20) for _ in xrange(30)]
In [43]: data
Out[43]:
[18,
15,
2,
2,
15,
7,
6,
0,
1,
8,
15,
9,
15,
8,
19,
14,
6,
17,
8,
1,
8,
15,
2,
3,
2,
13,
0,
19,
6,
4]
In [44]: c = dict.fromkeys(data,0)
In [45]: c
Out[45]:
{0: 0,
1: 0,
2: 0,
3: 0,
4: 0,
6: 0,
7: 0,
8: 0,
9: 0,
13: 0,
14: 0,
15: 0,
17: 0,
18: 0,
19: 0}
In [46]: for x in data:
....: c[x] += 1
....:
In [47]: c
Out[47]:
{0: 2,
1: 2,
2: 4,
3: 1,
4: 1,
6: 3,
7: 1,
8: 4,
9: 1,
13: 1,
14: 1,
15: 5,
17: 1,
18: 1,
19: 2}
In [48]: from collections import Counter
In [49]: c2 = Counter(data)
In [50]: c2
Out[50]: Counter({15: 5, 2: 4, 8: 4, 6: 3, 0: 2, 1: 2, 19: 2, 3: 1, 4: 1, 7: 1, 9: 1, 13: 1, 14: 1, 17: 1, 18: 1})
In [51]: c2[15]
Out[51]: 5
In [52]: c2[2]
Out[52]: 4
In [53]: c2.most_common(3)
Out[53]: [(15, 5), (2, 4), (8, 4)]
2.对某英文文章的单词,进行词频统计,找到出现次数最多的10个单词,它们出现的次数是多少?
import re
txt = open('/etc/passwd').read()
c3 = re.split('\W+',txt)
c4 = Counter(c3)
print c4.most_common(10)