How to Drop Duplicate Rows Across Specific Columns in Pandas?-Python Tutorial-php.cn

How to Drop Duplicate Rows Across Specific Columns in Pandas?

Patricia Arquette

Release： 2024-12-14 06:03:14

Original

743 people have browsed it

How to Drop Duplicate Rows Across Specific Columns in Pandas?

Dropping Rows with Duplicate Values in Multiple Columns Using Python Pandas

The pandas drop_duplicates function is a powerful tool for removing duplicate rows from a DataFrame, but what if you only want to drop rows that are duplicates across a subset of columns?

Example

Consider the following DataFrame:

A	B	C
foo	0	A
foo	1	A
foo	1	B
bar	1	A

Suppose you want to drop rows that match on columns A and C. In this case, you would want to drop rows 0 and 1.

Using drop_duplicates with the keep Parameter

To achieve this, you can use the drop_duplicates function with the keep parameter set to False. This parameter specifies how to handle duplicate rows. By default, keep is set to first, which means that the first occurrence of a duplicate row will be kept. Setting keep to False will drop all duplicate rows.

The following code demonstrates how to drop rows with duplicate values in columns A and C:

import pandas as pd

df = pd.DataFrame({"A": ["foo", "foo", "foo", "bar"], "B": [0, 1, 1, 1], "C": ["A", "A", "B", "A"]})

# Drop rows with duplicate values in columns 'A' and 'C'
df = df.drop_duplicates(subset=['A', 'C'], keep=False)

print(df)

Copy after login

Output:

  A  B  C
2 foo  1  B
3 bar  1  A

Copy after login

As you can see, rows 0 and 1 have been dropped, as they are duplicates with respect to columns A and C.

The above is the detailed content of How to Drop Duplicate Rows Across Specific Columns in Pandas?. For more information, please follow other related articles on the PHP Chinese website!