Pandas DataFrame의 쉼표로 구분된 문자열을 별도의 행으로 분할하는 방법은 무엇입니까?-파이썬 튜토리얼-php.cn

집

백엔드 개발

파이썬 튜토리얼

Pandas DataFrame의 쉼표로 구분된 문자열을 별도의 행으로 분할하는 방법은 무엇입니까?

Linda Hamilton

Dec 25, 2024 pm 09:50 PM

How to Split Comma-Separated Strings in a Pandas DataFrame into Separate Rows?

Pandas DataFrame에서 쉼표로 구분된 문자열 항목을 분할하여 별도의 행 생성

문제:
우리는 쉼표로 구분된 값이 포함된 문자열을 포함하는 Pandas DataFrame이 있습니다. 열. 각 CSV 항목을 분할하고 각 고유 값에 대해 새 행을 생성하려고 합니다. 예를 들어, "a,b,c"는 "a", "b", "c"가 되어야 합니다.

해결책:

옵션 1: DataFrame.explode() (Pandas 0.25.0 )

DataFrame.explode() 메서드는 이 목적을 위해 특별히 설계되었습니다. 이를 통해 목록과 같은 열(이 경우 쉼표로 구분된 문자열)을 개별 행으로 분할할 수 있습니다.

In [1]: df.explode('var1')
Out[1]:
  var1  var2 var3
0    a     1   XX
1    b     1   XX
2    c     1   XX
3    d     2   ZZ
4    e     2   ZZ
5    f     2   ZZ
6    x     2   ZZ
7    y     2   ZZ

로그인 후 복사

옵션 2: 사용자 정의 벡터화 함수

DataFrame.explode()를 사용할 수 없거나 추가 사용자 정의가 필요한 경우 자체 벡터화를 만들 수 있습니다. 기능:

import numpy as np

def explode(df, lst_cols, fill_value='', preserve_index=False):
    # Convert `lst_cols` to a list if it is a string.
    if isinstance(lst_cols, str):
        lst_cols = [lst_cols]

    # Calculate the lengths of each list in `lst_cols`.
    lens = df[lst_cols[0]].str.len()

    # Create a new index based on the lengths of the lists.
    idx = np.repeat(df.index.values, lens)

    # Create a new DataFrame with the exploded columns.
    exp_df = pd.DataFrame({
        col: np.repeat(df[col].values, lens)
        for col in df.columns.difference(lst_cols)
    }, index=idx).assign(**{
        col: np.concatenate(df.loc[lens > 0, col].values)
        for col in lst_cols
    })

    # Append rows with empty lists if necessary.
    if (lens == 0).any():
        exp_df = exp_df.append(df.loc[lens == 0, df.columns.difference(lst_cols)], sort=False).fillna(fill_value)

    # Revert the original index order and reset the index if requested.
    exp_df = exp_df.sort_index()
    if not preserve_index:
        exp_df = exp_df.reset_index(drop=True)

    return exp_df

로그인 후 복사

사용 예:

In [2]: explode(df, 'var1')
Out[2]:
  var1  var2 var3
0    a     1   XX
1    b     1   XX
2    c     1   XX
3    d     2   ZZ
4    e     2   ZZ
5    f     2   ZZ
6    x     2   ZZ
7    y     2   ZZ

로그인 후 복사

위 내용은 Pandas DataFrame의 쉼표로 구분된 문자열을 별도의 행으로 분할하는 방법은 무엇입니까?의 상세 내용입니다. 자세한 내용은 PHP 중국어 웹사이트의 기타 관련 기사를 참조하세요!

본 웹사이트의 성명

본 글의 내용은 네티즌들의 자발적인 기여로 작성되었으며, 저작권은 원저작자에게 있습니다. 본 사이트는 이에 상응하는 법적 책임을 지지 않습니다. 표절이나 침해가 의심되는 콘텐츠를 발견한 경우 admin@php.cn으로 문의하세요.