如何将 Pandas DataFrame 中的逗号分隔字符串拆分为单独的行？-Python教程-PHP中文网

首页

后端开发

Python教程

如何将 Pandas DataFrame 中的逗号分隔字符串拆分为单独的行？

Linda Hamilton

Dec 25, 2024 pm 09:50 PM

How to Split Comma-Separated Strings in a Pandas DataFrame into Separate Rows?

拆分 Pandas DataFrame 中的逗号分隔字符串条目以创建单独的行

问题：
我们有一个 Pandas DataFrame，其中一列中包含带有逗号分隔值的字符串。我们希望拆分每个 CSV 条目并为每个唯一值创建一个新行。例如，“a,b,c”应变为“a”，“b”，“c”。

解决方案：

选项 1： DataFrame.explode() (Pandas 0.25.0 )

DataFrame.explode() 方法是专门为此目的而设计的。它允许我们将类似列表的列（在本例中为逗号分隔的字符串）拆分为单独的行。

In [1]: df.explode('var1')
Out[1]:
  var1  var2 var3
0    a     1   XX
1    b     1   XX
2    c     1   XX
3    d     2   ZZ
4    e     2   ZZ
5    f     2   ZZ
6    x     2   ZZ
7    y     2   ZZ

登录后复制

选项 2：自定义向量化函数

如果 DataFrame.explode() 不可用或者我们需要更多定制，我们可以创建自己的矢量化函数：

import numpy as np

def explode(df, lst_cols, fill_value='', preserve_index=False):
    # Convert `lst_cols` to a list if it is a string.
    if isinstance(lst_cols, str):
        lst_cols = [lst_cols]

    # Calculate the lengths of each list in `lst_cols`.
    lens = df[lst_cols[0]].str.len()

    # Create a new index based on the lengths of the lists.
    idx = np.repeat(df.index.values, lens)

    # Create a new DataFrame with the exploded columns.
    exp_df = pd.DataFrame({
        col: np.repeat(df[col].values, lens)
        for col in df.columns.difference(lst_cols)
    }, index=idx).assign(**{
        col: np.concatenate(df.loc[lens > 0, col].values)
        for col in lst_cols
    })

    # Append rows with empty lists if necessary.
    if (lens == 0).any():
        exp_df = exp_df.append(df.loc[lens == 0, df.columns.difference(lst_cols)], sort=False).fillna(fill_value)

    # Revert the original index order and reset the index if requested.
    exp_df = exp_df.sort_index()
    if not preserve_index:
        exp_df = exp_df.reset_index(drop=True)

    return exp_df

登录后复制

用法示例：

In [2]: explode(df, 'var1')
Out[2]:
  var1  var2 var3
0    a     1   XX
1    b     1   XX
2    c     1   XX
3    d     2   ZZ
4    e     2   ZZ
5    f     2   ZZ
6    x     2   ZZ
7    y     2   ZZ

登录后复制

以上是如何将 Pandas DataFrame 中的逗号分隔字符串拆分为单独的行？的详细内容。更多信息请关注PHP中文网其他相关文章！

本站声明

本文内容由网友自发贡献，版权归原作者所有，本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容，请联系admin@php.cn