pandas method to get the row with the maximum value in the groupby group

不言
Release: 2023-03-24 14:34:02
Original
4096 people have browsed it

The following is a pandas method for getting the row with the maximum value in the groupby group. It has a good reference value and I hope it will be helpful to everyone. Let’s take a look together

pandas method of getting the row with the maximum value in the groupby group

For example, in the following DataFrame, group by Mt and take out The row with the largest Count

import pandas as pd
df = pd.DataFrame({'Sp':['a','b','c','d','e','f'], 'Mt':['s1', 's1', 's2','s2','s2','s3'], 'Value':[1,2,3,4,5,6], 'Count':[3,2,5,10,10,6]})

df
Copy after login


# #CountMtSpValue##01##25s2c3310s2d4410s2#e556s3f6##Method 1: Filter out the largest Count in the group The rows of

3s1a1
2s1b2

df.groupby('Mt').apply(lambda t: t[t.Count==t.Count.max()])
Copy after login


SpValueMt0310s26Method 2: Use transform to get the index of the original dataframe, and then filter out the required rows

##Count
Mt





#s1
s1a1s23
s2d4410
e5s35
s3f6

print df.groupby(['Mt'])['Count'].agg(max)

idx=df.groupby(['Mt'])['Count'].transform(max)
print idx
idx1 = idx == df['Count']
print idx1

df[idx1]
Copy after login

Mt
s1 3
s2 10
s3 6
Name: Count, dtype: int64
0 3
1 3
2 10
3 10
4 10
5 6
dtype: int64
0 True
1 False
2 False
3 True
4 True
5 True
dtype: bool
Copy after login


##CountMtValues1adefThe above method has a problem with the values ​​​​in rows 3 and 4. They are all maximum values, so multiple rows are returned. What if only one row is returned? Method 3: idmax (the old version of pandas is argmax)
Sp
##03
1310s2
4 410s2
55 6s3
6

idx = df.groupby('Mt')['Count'].idxmax()
print idx
Copy after login

df.iloc[idx]
Mt
s1 0
s2 3
s3 5
Name: Count, dtype: int64
Copy after login

#Count

MtSp03s1a1310s2d456s3f6
df.iloc[df.groupby(['Mt']).apply(lambda x: x['Count'].idxmax())]
Copy after login
Value

##Count


Mt

SpValuea146

03s1
310s2d
56s3f
def using_apply(df):
 return (df.groupby('Mt').apply(lambda subf: subf['Value'][subf['Count'].idxmax()]))

def using_idxmax_loc(df):
 idx = df.groupby('Mt')['Count'].idxmax()
 return df.loc[idx, ['Mt', 'Value']]

print using_apply(df)

using_idxmax_loc(df)
Copy after login
Mt
s1 1
s2 4
s3 6
dtype: int64
Copy after login

##Mt

#Value

s113s24s3Method 4: Sort the order first, then take the first from each group
0
##5
6
df.sort('Count', ascending=False).groupby('Mt', as_index=False).first()
Copy after login

##Mt

Count

Sp

Value0s13a11s210d42s36f6

Then the problem comes again. What if you don’t want to extract the row with the maximum value, for example, the row with the middle value? The idea is still similar, but some modifications may be needed in the specific writing method. For example, methods 1 and 2 need to modify the max algorithm, and method 3 needs to implement a method that returns index. Anyway, after groupby, each group is a dataframe. Related recommendations: pandas dataframe implements row and column selection and slicing operations
Getting started with Python data processing library pandas


The above is the detailed content of pandas method to get the row with the maximum value in the groupby group. For more information, please follow other related articles on the PHP Chinese website!

Related labels:
source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template