Sorting Within Groups in pandas
When working with pandas dataframes, it is often necessary to group data by specific columns and then perform additional operations within those groups. One common requirement is to sort the grouped data based on a certain criterion.
To achieve this, the groupby function can be chained with the sort_values function. As an example, consider a dataframe df that has columns count, job, and source.
In [167]: df Out[167]: count job source 0 2 sales A 1 4 sales B 2 6 sales C 3 3 sales D 4 7 sales E 5 5 market A 6 3 market B 7 2 market C 8 4 market D 9 1 market E
If you want to group the data by job and source and then sort the aggregated results by count in descending order, you can do the following:
In [168]: df.groupby(['job','source']).agg({'count':sum})
This will create a new dataframe that contains the aggregated count values for each group. However, the resulting dataframe will not be sorted by count. To sort the dataframe, you can use the sort_values function:
In [34]: df.sort_values(['job','count'],ascending=False)
This will sort the dataframe by job first and then by count in descending order. The resulting dataframe will look like this:
Out[35]: count job source 4 7 sales E 2 6 sales C 1 4 sales B 5 5 market A 8 4 market D 6 3 market B
To take the top three rows of each group, you can use the head function:
In [34]: df.sort_values(['job','count'],ascending=False).groupby('job').head(3)
This will result in a new dataframe that contains the top three rows of each group, sorted by count in descending order.
Out[35]: count job source 4 7 sales E 2 6 sales C 1 4 sales B 5 5 market A 8 4 market D 6 3 market B
The above is the detailed content of How to Sort Data Within Groups in Pandas DataFrames?. For more information, please follow other related articles on the PHP Chinese website!