Home > Backend Development > Python Tutorial > How to Explode a Pandas DataFrame Column into Multiple Rows?

How to Explode a Pandas DataFrame Column into Multiple Rows?

Susan Sarandon
Release: 2024-12-25 09:46:16
Original
595 people have browsed it

How to Explode a Pandas DataFrame Column into Multiple Rows?

How to Unnest (Explode) a Column in a Pandas DataFrame, into Multiple Rows

In Pandas, exploding a column involves transforming data from a single row into multiple rows. This is useful when you have a column containing list-type cells and need to split them into individual rows.

Consider a DataFrame with a column 'B' containing lists:

df = pd.DataFrame({'A': [1, 2], 'B': [[1, 2], [1, 2]]})

Output:

   A       B
0  1  [1, 2]
1  2  [1, 2]
Copy after login

To explode this column 'B,' we present various methods:

Method 0 [Pandas >= 0.25]
Starting from Pandas 0.25, if you need to explode only one column, use the pandas.DataFrame.explode function:

df.explode('B')

Output:

   A  B
0  1  1
1  1  2
3  2  1
4  2  2
Copy after login

Method 1
apply pd.Series (easy to understand but not recommended for performance):

df.set_index('A').B.apply(pd.Series).stack().reset_index(level=0).rename(columns={0:'B'})
Copy after login

Method 2
Using repeat with DataFrame constructor:

df = pd.DataFrame({'A': df.A.repeat(df.B.str.len()), 'B': np.concatenate(df.B.values)})
Copy after login

Method 3
Re-create the list:

pd.DataFrame([[x] + [z] for x, y in df.values for z in y], columns=df.columns)
Copy after login

Method 4
Using reindex or loc:

df.reindex(df.index.repeat(df.B.str.len())).assign(B=np.concatenate(df.B.values))
Copy after login

Method 5
When the list contains only unique values:

from collections import ChainMap
d = dict(ChainMap(*map(dict.fromkeys, df['B'], df['A'])))
pd.DataFrame(list(d.items()), columns=df.columns[::-1])
Copy after login

Method 6
Using NumPy for high performance:

newvalues = np.dstack((np.repeat(df.A.values, list(map(len, df.B.values))), np.concatenate(df.B.values)))
pd.DataFrame(data=newvalues[0], columns=df.columns)
Copy after login

Method 7
Using itertools cycle and chain:

from itertools import cycle, chain
l = df.values.tolist()
l1 = [list(zip([x[0]], cycle(x[1])) if len([x[0]]) > len(x[1]) else list(zip(cycle([x[0]]), x[1]))) for x in l]
pd.DataFrame(list(chain.from_iterable(l1)), columns=df.columns)
Copy after login

Generalizing to Multiple Columns
To handle multiple exploding columns, a function can be defined:

def unnesting(df, explode):
    idx = df.index.repeat(df[explode[0]].str.len())
    df1 = pd.concat([
        pd.DataFrame({x: np.concatenate(df[x].values)}) for x in explode], axis=1)
    df1.index = idx

    return df1.join(df.drop(explode, 1), how='left')

unnesting(df, ['B', 'C'])
Copy after login

Column-Wise Unnesting
To expand a list horizontally, use the pd.DataFrame constructor:

df.join(pd.DataFrame(df.B.tolist(), index=df.index).add_prefix('B_'))
Copy after login

The above is the detailed content of How to Explode a Pandas DataFrame Column into Multiple Rows?. For more information, please follow other related articles on the PHP Chinese website!

source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Latest Articles by Author
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template