Home > Backend Development > Python Tutorial > How to Split Comma-Separated Strings in a Pandas DataFrame into Separate Rows?

How to Split Comma-Separated Strings in a Pandas DataFrame into Separate Rows?

Linda Hamilton
Release: 2024-12-25 21:50:14
Original
836 people have browsed it

How to Split Comma-Separated Strings in a Pandas DataFrame into Separate Rows?

Splitting Comma-Separated String Entries in a Pandas DataFrame to Create Separate Rows

Problem:
We have a Pandas DataFrame containing strings with comma-separated values in one column. We wish to split each CSV entry and create a new row for each unique value. For instance, "a,b,c" should become "a", "b", "c".

Solution:

Option 1: DataFrame.explode() (Pandas 0.25.0 )

The DataFrame.explode() method is specifically designed for this purpose. It allows us to split a list-like column (in this case, our comma-separated strings) into individual rows.

In [1]: df.explode('var1')
Out[1]:
  var1  var2 var3
0    a     1   XX
1    b     1   XX
2    c     1   XX
3    d     2   ZZ
4    e     2   ZZ
5    f     2   ZZ
6    x     2   ZZ
7    y     2   ZZ
Copy after login

Option 2: Custom Vectorized Function

If DataFrame.explode() is not available or we need more customization, we can create our own vectorized function:

import numpy as np

def explode(df, lst_cols, fill_value='', preserve_index=False):
    # Convert `lst_cols` to a list if it is a string.
    if isinstance(lst_cols, str):
        lst_cols = [lst_cols]

    # Calculate the lengths of each list in `lst_cols`.
    lens = df[lst_cols[0]].str.len()

    # Create a new index based on the lengths of the lists.
    idx = np.repeat(df.index.values, lens)

    # Create a new DataFrame with the exploded columns.
    exp_df = pd.DataFrame({
        col: np.repeat(df[col].values, lens)
        for col in df.columns.difference(lst_cols)
    }, index=idx).assign(**{
        col: np.concatenate(df.loc[lens > 0, col].values)
        for col in lst_cols
    })

    # Append rows with empty lists if necessary.
    if (lens == 0).any():
        exp_df = exp_df.append(df.loc[lens == 0, df.columns.difference(lst_cols)], sort=False).fillna(fill_value)

    # Revert the original index order and reset the index if requested.
    exp_df = exp_df.sort_index()
    if not preserve_index:
        exp_df = exp_df.reset_index(drop=True)

    return exp_df
Copy after login

Example usage:

In [2]: explode(df, 'var1')
Out[2]:
  var1  var2 var3
0    a     1   XX
1    b     1   XX
2    c     1   XX
3    d     2   ZZ
4    e     2   ZZ
5    f     2   ZZ
6    x     2   ZZ
7    y     2   ZZ
Copy after login

The above is the detailed content of How to Split Comma-Separated Strings in a Pandas DataFrame into Separate Rows?. For more information, please follow other related articles on the PHP Chinese website!

source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Latest Articles by Author
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template