Given a DataFrame containing columns for ID (id), group (group), and term (term), the goal is to efficiently count the occurrences of each term for each unique combination of ID and group.
Utilizing Pandas' powerful groupby and size functions, we can achieve this without resorting to loops:
df.groupby(['id', 'group', 'term']).size().unstack(fill_value=0)
This operation produces a hierarchical MultiIndex DataFrame presenting the term counts:
</p> <div class="code" style="position:relative; padding:0px; margin:0px;"><pre class="brush:php;toolbar:false"> term
group term1 term2 term3
id
1 3 2 0
2 2 1 1
Even for massive datasets with millions of rows, this vectorized approach demonstrates exceptional performance:
1,000,000 rows ---------------- Elapsed time: 1.2 seconds
The above is the detailed content of How to Efficiently Count Term Occurrences by ID and Group in Pandas?. For more information, please follow other related articles on the PHP Chinese website!