Unnesting (also known as exploding) multiple list columns in large Pandas DataFrames can be a computationally intensive task, especially when the dataset size is substantial. To address this challenge, we explore two efficient methods that cater to different Pandas versions.
For Pandas versions 1.3 and higher, the DataFrame.explode method provides a straightforward way to explode multiple columns simultaneously. This method requires that all values in the selected columns have lists of equal size. Simply pass the column names to the explode method, as shown below:
df.explode(['B', 'C', 'D', 'E']).reset_index(drop=True)
For older Pandas versions, we can employ Series.explode on each column. We first set as the index all columns that should not be exploded and then reset the index after the operation.
df.set_index(['A']).apply(pd.Series.explode).reset_index()
Both methods offer efficient performance, as demonstrated by the following timings on a large dataset:
%timeit df2.explode(['B', 'C', 'D', 'E']).reset_index(drop=True) %timeit df2.set_index(['A']).apply(pd.Series.explode).reset_index() # Pandas >= 1.3 (fastest) 2.59 ms ± 112 µs per loop # Pandas >= 0.25 1.27 ms ± 239 µs per loop
By taking advantage of these efficient methods, we can effectively unnest multiple list columns in Pandas DataFrames of any size, enabling seamless data analysis and manipulation.
The above is the detailed content of How to Efficiently Unnest Multiple List Columns in Pandas DataFrames?. For more information, please follow other related articles on the PHP Chinese website!