如何在Python中统计词频并按频率排序？-Python教程-PHP中文网

首页

后端开发

Python教程

如何在Python中统计词频并按频率排序？

Barbara Streisand

Oct 21, 2024 pm 09:39 PM

How to Count Word Frequency and Sort by Frequency in Python?

计算词频并按频率排序

在处理包含文本数据的大型数据集时，通常需要分析单个词的频率。此信息可用于各种自然语言处理 (NLP) 任务。在 Python 中，可以使用名为 Counter 的强大工具来简化此任务。

实现设计

您的设计概述了以下步骤：

创建一个空列表来存储唯一单词（newlst）。
创建一个空列表来存储对应的单词频率（Frequency）。
迭代原始单词列表。
对于每个单词，检查它是否已经在 newlst 中。
如果该单词不在 newlst 中，则添加它并将频率设置为 1。
如果该单词已经在 newlst 中，增加其频率。
根据频率列表对 newlst 进行排序。

在 Python 中使用 Counter

Python 的集合模块提供了专门的名为 Counter 的类，旨在对可迭代对象中的元素进行计数和聚合。 Counter 允许我们在一行代码中执行步骤 3-6。以下是使用 Counter 实现设计的方法：

<code class="python">from collections import Counter

# Create a Counter from the list of words
counts = Counter(original_list)

# Sort the keys (unique words) based on their frequencies
sorted_words = sorted(counts.keys(), key=lambda x: counts[x], reverse=True)</code>

登录后复制

此代码生成唯一单词的排序列表，其中频率最高的单词首先出现。

示例

<code class="python">list1 = ['the', 'car', 'apple', 'banana', 'car', 'apple']
counts = Counter(list1)
print(counts)  # Counter({'apple': 2, 'car': 2, 'banana': 1, 'the': 1})
sorted_words = sorted(counts.keys(), key=lambda x: counts[x], reverse=True)
print(sorted_words)  # ['apple', 'car', 'banana', 'the']</code>

登录后复制

以上是如何在Python中统计词频并按频率排序？的详细内容。更多信息请关注PHP中文网其他相关文章！

本站声明

本文内容由网友自发贡献，版权归原作者所有，本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容，请联系admin@php.cn