In Pandas, you can perform multi-level grouping and aggregation to calculate complex statistics. One common task is to calculate the average of a column within groups defined by multiple other columns.
Consider the following DataFrame:
cluster org time 1 a 8 1 a 6 2 h 34 1 c 23 2 d 74 3 w 6
To calculate the average of time per org within each cluster, you can group the DataFrame by both cluster and org:
df.groupby(['cluster', 'org'], as_index=False).mean()
This will produce a DataFrame grouped by cluster and org, with the average of time calculated for each group:
cluster org time 0 1 a 12.333333 1 1 c 23.0 2 2 h 34.0 3 2 d 74.0 4 3 w 6.0
If you only want the mean of time within each cluster, you can group only by cluster:
df.groupby('cluster').mean()
This will produce a DataFrame with the average of time calculated for each cluster:
cluster time 0 1 12.333333 1 2 54.0 2 3 6.0
Alternatively, you can use the groupby method on the multi-column combination ['cluster', 'org'] and then calculate the mean of time:
df.groupby(['cluster', 'org']).mean()['time']
This will produce a Series with the average of time calculated for each combination of cluster and org.
The above is the detailed content of How to Calculate Average Values Within Multiple Groups in Pandas?. For more information, please follow other related articles on the PHP Chinese website!