Group-by Aggregation with Multiple Groupings and Average
In Pandas, performing aggregations on data grouped by multiple levels is a common operation. Consider the following DataFrame:
cluster org time 1 a 8 1 a 6 2 h 34 1 c 23 2 d 74 3 w 6
A common task is to calculate the average of a given column, such as "time," per group defined by multiple variables, such as "cluster" and "org."
Solution 1: Mean on Cluster Groupings Only
To compute the mean of "time" grouped by "cluster" only, you can use the following code:
df.groupby(['cluster']).mean()
Result:
time cluster 1 12.333333 2 54.000000 3 6.000000
Solution 2: Mean on a Combination of Groupings
If you want to calculate the mean of "time" for each combination of "cluster" and "org," you can use:
df.groupby(['cluster', 'org']).mean()
Result:
time cluster org 1 a 438886 c 23 2 d 9874 h 34 3 w 6
Solution 3: Nested Mean on Groupings
To perform a nested mean, first averaging on the combination of "cluster" and "org" and then averaging on "cluster" groups, use:
(df.groupby(['cluster', 'org'], as_index=False).mean() .groupby('cluster')['time'].mean())
Result:
cluster mean(time) 1 15 #=((8 + 6) / 2 + 23) / 2 2 54 #=(74 + 34) / 2 3 6
The above is the detailed content of How to Calculate Average Values with Multiple Groupings in Pandas?. For more information, please follow other related articles on the PHP Chinese website!