This article mainly introduces the detailed idea of counting word count in Python. The article also provides you with a solution without using third-party modules. Friends who are interested should take a look together
Problem description:
Use Python to implement the function count_words(). This function inputs the string s and the number n, and returns the n words with the highest frequency in s. The return value is a list of tuples, including the n words with the highest occurrences and their times, that is, [(
You can assume that all input is lowercase and contains no punctuation or other characters (only letters and a single space). If there are the same number of occurrences, they are arranged in alphabetical order.
For example:
print count_words("betty bought a bit of butter but the butter was bitter",3)
Output:
[('butter', 2), ('a ', 1), ('betty', 1)]
Ideas to solve the problem:
1. s performs whitespace splitting to obtain all word lists split_s, such as: ['betty', 'bought', 'a', 'bit', 'of', 'butter', 'but', 'the', 'butter' , 'was', 'bitter']
2. Create a maplist and convert split_s into a list whose elements are tuples, such as: [('betty', 1), ('bought', 1) , ('a', 1), ('bit', 1), ('of', 1), ('butter', 1), ('but', 1), ('the', 1), ( 'butter', 1), ('was', 1), ('bitter', 1)]
3. Merge the elements in the maplist. If the first index value of the tuple is the same, then the first index value of the tuple will be the same. The two index values are added.
// Note: Prepare to use defaultdict. The obtained data is as follows: {'betty': 1, 'bought': 1, 'a': 1, 'bit': 1, 'of': 1, 'butter': 2, 'but': 1, 'the ': 1, 'was': 1, 'bitter': 1}
4. Sort alphabetically by key and get the following: [('a', 1), ('betty', 1), ('bit', 1), ('bitter', 1), ('bought', 1), ('but', 1), ('butter', 2), ('of', 1) , ('the', 1), ('was', 1)]
5. Perform secondary sorting, sort by value, and get the following: [('butter', 2), ('a ', 1), ('betty', 1), ('bit', 1), ('bitter', 1), ('bought', 1), ('but', 1), ('of', 1), ('the', 1), ('was', 1)]
6. Use slicing to extract a set of data with higher frequency
Summary: Not available on python3 The sorting results of defaultdict are also correct, but not correct on python2. defaultdict itself has no order. To distinguish the list, it must be sorted.
You can also try to write it yourself without using third-party modules
Solution 1 (use defaultdict):
from collections import defaultdict """Count words.""" def count_words(s, n): """Return the n most frequently occuring words in s.""" split_s = s.split() map_list = [(k,1) for k in split_s] output = defaultdict(int) for d in map_list: output[d[0]] += d[1] output1 = dict(output) top_n = sorted(output1.items(), key=lambda pair:pair[0], reverse=False) top_n = sorted(top_n, key=lambda pair:pair[1], reverse=True) return top_n[:n] def test_run(): """Test count_words() with some inputs.""" print(count_words("cat bat mat cat bat cat", 3)) print(count_words("betty bought a bit of butter but the butter was bitter", 4)) if __name__ == '__main__': test_run()
Solution 2 (Use Counter)
from collections import Counter """Count words.""" def count_words(s, n): """Return the n most frequently occuring words in s.""" split_s = s.split() split_s = Counter(name for name in split_s) print(split_s) top_n = sorted(split_s.items(), key=lambda pair:pair[0], reverse=False) print(top_n) top_n = sorted(top_n, key=lambda pair:pair[1], reverse=True) print(top_n) return top_n[:n] def test_run(): """Test count_words() with some inputs.""" print(count_words("cat bat mat cat bat cat", 3)) print(count_words("betty bought a bit of butter but the butter was bitter", 4)) if __name__ == '__main__': test_run()
Related recommendations:
Python implements the calculation of the value of pi to any value Bit method example
The above is the detailed content of Detailed explanation of the idea of counting word count in Python. For more information, please follow other related articles on the PHP Chinese website!