如何在 Python 中辨識和檢索 Pandas DataFrame 中的重複項？-Python教學-PHP中文網

如何在 Python 中辨識和檢索 Pandas DataFrame 中的重複項？

Patricia Arquette

發布： 2024-10-25 11:31:02

原創

964 人瀏覽過

How to Identify and Retrieve Duplicate Items within a Pandas DataFrame in Python?

如何在 Python 中使用 Pandas 取得所有重複項的清單

在處理資料集時，經常會遇到重複的條目。在這種情況下，您希望使用 Pandas 識別資料集中的所有重複項。

要實現此目的，您可以使用以下方法：

方法1（使用以下命令列印所有行）重複ID):

<code class="python">import pandas as pd

# Read the CSV data into a DataFrame
df = pd.read_csv("dup.csv")

# Extract the "ID" column
ids = df["ID"]

# Create a new DataFrame with only the duplicate values
duplicates = df[ids.isin(ids[ids.duplicated()])]

# Sort the DataFrame by the "ID" column
duplicates.sort_values("ID", inplace=True)

# Print the duplicate values
print(duplicates)</code>

登入後複製

方法2（分組並連接重複組）：

此方法組合重複組，從而得到簡潔的表示重複項目的數量：

<code class="python"># Group the DataFrame by the "ID" column
grouped = df.groupby("ID")

# Filter the grouped DataFrame to include only groups with more than one row
duplicates = [g for _, g in grouped if len(g) > 1]

# Concatenate the duplicate groups into a new DataFrame
duplicates = pd.concat(duplicates)

# Print the duplicate values
print(duplicates)</code>

登入後複製

使用方法1 或方法2，您可以成功取得資料集中所有重複項目的列表，以便您直觀地檢查它們並調查差異。

以上是如何在 Python 中辨識和檢索 Pandas DataFrame 中的重複項？的詳細內容。更多資訊請關注PHP中文網其他相關文章！