How to Retrieve Rows Based on Distinct Column Values in Pandas?-Python Tutorial-php.cn

How to Retrieve Rows Based on Distinct Column Values in Pandas?

Barbara Streisand

Release： 2024-11-04 04:43:01

Original

991 people have browsed it

How to Retrieve Rows Based on Distinct Column Values in Pandas?

Retrieving Rows Based on Distinct Column Values

In data manipulation scenarios, it becomes essential to extract rows based on unique values within a particular column. This article will demonstrate how to achieve this using Pandas, a popular Python library for data manipulation and analysis.

Problem Statement

Consider a dataframe with two columns, COL1 and COL2. The task is to retrieve rows where the values in COL2 are unique. For instance, given the dataframe below:

COL1	COL2
a.com	22
b.com	45
c.com	34
e.com	45
f.com	56
g.com	22
h.com	45

The desired output is to obtain the rows based on the unique values in COL2:

COL1	COL2
a.com	22
b.com	45
c.com	34
f.com	56

Solution: Using Pandas' drop_duplicates() Method

The Pandas library provides a convenient method called drop_duplicates() to accomplish this task. By specifying the column name in the argument, you can check for duplicates and remove or keep specific rows based on your requirements.

For example, to remove all duplicate rows based on COL2 values, use the following code:

<code class="python">import pandas as pd

df = pd.DataFrame({'COL1': ['a.com', 'b.com', 'c.com', 'e.com', 'f.com', 'g.com', 'h.com'],
                   'COL2': [22, 45, 34, 45, 56, 22, 45]})

df = df.drop_duplicates('COL2')

# Displaying the result
print(df)</code>

Copy after login

This will output the dataframe with unique values in COL2:

COL1	COL2
a.com	22
b.com	45
c.com	34
f.com	56

Additionally, you can specify the keep parameter to control which duplicate rows to keep. By default, it keeps the first occurrence ('first'), but you can also keep the last ('last') or remove all duplicates ('False').

<code class="python"># Keep first occurrence
df = df.drop_duplicates('COL2', keep='first')

# Keep last occurrence
df = df.drop_duplicates('COL2', keep='last')

# Remove all duplicates
df = df.drop_duplicates('COL2', keep=False)</code>

Copy after login

The above is the detailed content of How to Retrieve Rows Based on Distinct Column Values in Pandas?. For more information, please follow other related articles on the PHP Chinese website!