Home > Backend Development > Python Tutorial > How to Count Unique Values in Groups with Pandas?

How to Count Unique Values in Groups with Pandas?

Patricia Arquette
Release: 2024-10-18 15:52:03
Original
276 people have browsed it

How to Count Unique Values in Groups with Pandas?

Counting Unique Values in Groups with Pandas

When working with datasets containing multiple variables grouped into categories, it often becomes necessary to determine the number of unique values associated with each group. Pandas, a widely used Python library for data manipulation, offers several methods to count unique values within groups.

One common need is to count the number of unique identifiers within each domain. Given a data frame with columns for ID and domain, we seek to obtain a result that displays the count of unique IDs for each domain.

Specifically, considering the data:

      ID   domain
0    123   vk.com
1    123   vk.com
2    123  twitter.com
3    456   vk.com
4    456  facebook.com
5    456   vk.com
6    456   google.com
7    789  twitter.com
8    789   vk.com
Copy after login

We aim to achieve the following output:

domain  count
vk.com       3
twitter.com   2
facebook.com  1
google.com    1
Copy after login

To achieve this, we can employ the nunique() function within the Pandas groupby operation. By grouping the data frame by the domain column and subsequently applying the nunique() function to the ID column, we obtain the count of unique values for each domain. The resulting data frame will contain the desired result:

df = df.groupby(['domain', 'ID']).nunique()

print(df)
Copy after login

However, in certain scenarios, the data may contain characters such as single quotes within the domain names. To handle such cases, we can utilize the str.strip("'") function to remove the single quotes before grouping and counting. This can be implemented as:

df = df.ID.groupby([df.domain.str.strip("'")]).nunique()

print(df)
Copy after login

Alternatively, we can simplify the code by chaining the str.strip("'") function within the groupby operation:

df.groupby(df.domain.str.strip("'"))['ID'].nunique()
Copy after login

To retain the domain column in the resulting data frame, we can use the agg() function with the as_index=False parameter:

df = df.groupby(by='domain', as_index=False).agg({'ID': pd.Series.nunique})

print(df)
Copy after login

This method will return a data frame with both the domain and count columns, where count represents the number of unique IDs associated with each domain.

The above is the detailed content of How to Count Unique Values in Groups with Pandas?. For more information, please follow other related articles on the PHP Chinese website!

source:php
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Latest Articles by Author
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template