Splitting Comma-Separated Pandas Dataframe Strings into Separate Rows
In pandas dataframes, it is often encountered that one or more columns contain comma-separated values (CSV) that need to be split into individual rows. To achieve this, several approaches can be employed:
Using Series.explode() or DataFrame.explode():
This method is available in Pandas 0.25.0 and above and is specifically designed for exploding list-like columns.
df.explode('column_name')
Using a Vectorized Function:
For situations involving multiple normal and multiple list columns, a vectorized function can provide a more versatile solution.
def explode(df, lst_cols, fill_value='', preserve_index=False): # ... (implementation details)
Converting CSV Strings to Lists:
If the goal is solely to convert CSV strings to lists, this can be achieved by splitting the strings using str.split().
df['var1'] = df['var1'].str.split(',')
Custom Vectorized Approach:
This approach can handle multiple columns, including both normal and list columns.
exploded_df = pd.DataFrame({ col: np.repeat(x[col].values, x[lst_col].str.len()) for col in x.columns.difference([lst_col]) }).assign(**{lst_col: np.concatenate(x[lst_col].values)})[x.columns.tolist()]
Legacy Solution:
An earlier method involves using .set_index(), .str.split(), .stack(), and .reset_index() to split the CSV strings and stack them into individual rows.
These approaches offer various options for splitting comma-separated strings in Pandas dataframes, catering to specific requirements and performance considerations.
The above is the detailed content of How Can I Split Comma-Separated Strings in a Pandas DataFrame into Separate Rows?. For more information, please follow other related articles on the PHP Chinese website!