Retrieving Rows by Distinct Column Values: A Comprehensive Guide
Many programming scenarios require extracting rows based on unique values within specific columns. This article explores how to accomplish this using the widely-used Pandas library in Python.
Query:
Consider a dataset with two columns, COL1 and COL2, as shown below:
COL1 COL2 a.com 22 b.com 45 c.com 34 e.com 45 f.com 56 g.com 22 h.com 45
The goal is to retrieve only the rows where COL2 contains unique values. The expected output is:
COL1 COL2 a.com 22 b.com 45 c.com 34 f.com 56
Solution:
The drop_duplicates method in Pandas provides a straightforward way to eliminate duplicate rows based on one or more columns. Here's how to utilize it for this specific task:
<code class="python">import pandas as pd df = pd.DataFrame({'COL1': ['a.com', 'b.com', 'c.com', 'e.com', 'f.com', 'g.com', 'h.com'], 'COL2': [22, 45, 34, 45, 56, 22, 45]}) # Keep only the first occurrence of each unique value in COL2 df = df.drop_duplicates('COL2') print(df)</code>
Output:
COL1 COL2 0 a.com 22 1 b.com 45 2 c.com 34 4 f.com 56
Additional Options:
The drop_duplicates method offers additional options to customize the handling of duplicates:
Here are examples demonstrating these options:
<code class="python"># Keep only the last occurrence of each unique value in COL2 df = df.drop_duplicates('COL2', keep='last') # Remove all duplicate rows from the dataset df = df.drop_duplicates('COL2', keep=False)</code>
The above is the detailed content of How to Retrieve Rows with Unique Values in a Pandas DataFrame?. For more information, please follow other related articles on the PHP Chinese website!