Counting unique values grouped by a specific column is a common task in data analysis. Pandas provides various methods to achieve this.
In your case, you have a DataFrame with 'ID' and 'domain' columns and need to count unique 'ID' values for each 'domain'.
Using df.groupby['domain', 'ID'].count()':
This method returns a DataFrame with counts for both 'ID' and 'domain' groups. However, it counts the number of rows in each group, not just the unique 'ID' values.
Solution with `nunique()':
df.groupby('domain')['ID'].nunique() calculates the unique 'ID' count for each 'domain' group. The resulting DataFrame will have the 'domain' column as the index and the count as a new column.
Stripping Single Quotes:
If your 'domain' column contains single quotes, use df.domain.str.strip("'") to remove them before grouping.
Preserve Column Name:
To keep the 'ID' column name in the result, use df.groupby(by='domain', as_index=False).agg({'ID': pd.Series.nunique}). This will create a DataFrame with the 'domain' and 'ID' (unique count) columns.
The above is the detailed content of How to Count Unique Values Grouped by a Column with Pandas?. For more information, please follow other related articles on the PHP Chinese website!