Grouped Differences in Pandas with Multiple Fields
In this situation, we aim to calculate the differences in scores for different site and country combinations over time.
To achieve this, we begin by sorting the dataframe according to site, country, and date:
<code class="python">df = df.sort_values(by=['site', 'country', 'date'])</code>
Next, we utilize groupby and diff to calculate the differences within each site and country group:
<code class="python">df['diff'] = df.groupby(['site', 'country'])['score'].diff().fillna(0)</code>
This generates differences within each site and country group and fills any missing values with 0.
Finally, we display the results:
<code class="python">print(df) Output: date site country score diff 8 2018-01-01 fb es 100 0.0 9 2018-01-02 fb gb 100 0.0 5 2018-01-01 fb us 50 0.0 6 2018-01-02 fb us 55 5.0 7 2018-01-03 fb us 100 45.0 1 2018-01-01 google ch 50 0.0 4 2018-01-02 google ch 10 -40.0 0 2018-01-01 google us 100 0.0 2 2018-01-02 google us 70 -30.0 3 2018-01-03 google us 60 -10.0</code>
Please note that sorting by arbitrary order is not directly supported. For such scenarios, consider storing your order in a collection and making your column categorical. That way, sort_values will align with the provided order.
The above is the detailed content of How to Calculate Grouped Differences in Pandas with Multiple Fields?. For more information, please follow other related articles on the PHP Chinese website!