Home > Backend Development > Python Tutorial > How to Remove Duplicate Rows in Pandas Based on Specific Columns?

How to Remove Duplicate Rows in Pandas Based on Specific Columns?

DDD
Release: 2024-12-17 13:03:26
Original
832 people have browsed it

How to Remove Duplicate Rows in Pandas Based on Specific Columns?

Removing Duplicate Rows Based on Multiple Columns in Python Pandas

The drop_duplicates function in Pandas provides an efficient way to remove duplicate rows from a DataFrame. However, what if you want to drop rows only if they match on a specific set of columns?

Problem:

Consider a DataFrame with columns "A," "B," and "C." You want to remove rows where the values in columns "A" and "C" are the same. In other words, you need to identify and drop rows 0 and 1 from this example DataFrame:

A B C
0 foo 0 A
1 foo 1 A
2 foo 1 B
3 bar 1 A

Solution:

You can now easily achieve this using the drop_duplicates function and the subset parameter:

import pandas as pd

df = pd.DataFrame({"A": ["foo", "foo", "foo", "bar"], "B": [0, 1, 1, 1], "C": ["A", "A", "B", "A"]})
df.drop_duplicates(subset=['A', 'C'], keep=False)
Copy after login

The keep= parameter specifies whether to drop duplicate rows, including the first occurrence, or to exclude them. Setting it to False will drop all duplicates.

The result is a DataFrame with rows 0 and 1 removed, leaving only the unique rows based on columns "A" and "C":

A B C
0 foo 1 B
1 bar 1 A

The above is the detailed content of How to Remove Duplicate Rows in Pandas Based on Specific Columns?. For more information, please follow other related articles on the PHP Chinese website!

source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template