Finding Unique Client Count per Year in Pandas Equivalent to SQL's 'count(distinct)'
In SQL, counting the distinct clients per year can be achieved with the following query:
<code class="sql">SELECT count(distinct CLIENTCODE) FROM table GROUP BY YEARMONTH;</code>
To perform a similar operation in Pandas, which is commonly used as a database substitute, you can employ the nunique() function alongside the groupby() method:
<code class="python">table.groupby('YEARMONTH').CLIENTCODE.nunique()</code>
This code snippet groups the data frame table by the 'YEARMONTH' column and then applies the nunique() function to count the unique clients for each year. The result will be a pandas Series with the year months as the index and the number of unique clients per year as the values.
For example, if your table contains the following data:
CLIENTCODE | YEARMONTH |
---|---|
1 | 201301 |
1 | 201301 |
2 | 201301 |
1 | 201302 |
2 | 201302 |
2 | 201302 |
3 | 201302 |
Applying the nunique() function will produce the following output:
<code class="python">YEARMONTH 201301 2 201302 3</code>
This result indicates that for 201301, there are 2 unique clients, and for 201302, there are 3 unique clients.
The above is the detailed content of How Can Pandas Achieve Client Counting Equivalent to SQL\'s \'count(distinct)\' for Each Year?. For more information, please follow other related articles on the PHP Chinese website!