Pandas Equivalent: Distinct Count by Group
When using Pandas as a database alternative, it's often necessary to perform complex operations like distinct counting by group. In this case, we want to count the number of distinct clients per year month.
In SQL, this can be achieved using the count(distinct) aggregate function. However, Pandas provides a slightly different syntax for this operation.
To count the distinct clients per year month in Pandas, we can use the following code:
<code class="python">table.groupby('YEARMONTH').CLIENTCODE.nunique()</code>
The groupby() function splits the DataFrame into groups based on the specified column (YEARMONTH in this case). The nunique() function then counts the number of unique values within each group.
Here's an example to illustrate:
<code class="python">import pandas as pd # Create a DataFrame with sample data data = { 'YEARMONTH': ['201301', '201301', '201301', '201302', '201302', '201302', '201302'], 'CLIENTCODE': [1, 1, 2, 1, 2, 2, 3] } table = pd.DataFrame(data) # Count distinct clients per year month result = table.groupby('YEARMONTH').CLIENTCODE.nunique() print(result)</code>
Output:
YEARMONTH 201301 2 201302 3
As you can see, the result matches the expected output from the SQL query.
The above is the detailed content of How to Count Distinct Values by Group in Pandas?. For more information, please follow other related articles on the PHP Chinese website!