How to Efficiently Join Strings Within Pandas Groupby Results?-Python Tutorial-php.cn

How to Efficiently Join Strings Within Pandas Groupby Results?

Patricia Arquette

Release： 2024-12-16 15:22:11

Original

939 people have browsed it

How to Efficiently Join Strings Within Pandas Groupby Results?

Pandas groupby with Delimiter Join

In Pandas, using the groupby function can be useful when working with data containing duplicate values. However, if you wish to obtain a summarized value while retaining the distinct values in a group, implementing a custom join operation may be necessary.

Consider the following example:

col  val
A    Cat
A    Tiger
B    Ball
B    Bat

Copy after login

When using the groupby function to sum the 'val' column for each unique value in 'col', the following output is generated:

A CatTiger
B BallBat

Copy after login

To introduce a delimiter (e.g., '-') into the joined values, the following code can be used:

df.groupby(['col'])['val'].sum().apply(lambda x: '-'.join(x))

Copy after login

However, this approach leads to an unexpected result:

A C-a-t-T-i-g-e-r
B B-a-l-l-B-a-t

Copy after login

The issue arises due to the lambda function receiving a Series object containing the individual values from the 'val' column instead of the concatenated string.

The following alternative approach can be used to achieve the desired delimiter-joined output:

df.groupby('col')['val'].agg('-'.join)

Copy after login

This provides the output:

col
A    Cat-Tiger
B     Ball-Bat
Name: val, dtype: object

Copy after login

To convert the index or MultiIndex to columns, you can use the reset_index function:

df1 = df.groupby('col')['val'].agg('-'.join).reset_index(name='new')

Copy after login

The above is the detailed content of How to Efficiently Join Strings Within Pandas Groupby Results?. For more information, please follow other related articles on the PHP Chinese website!