Efficiently Creating Multiple Columns with Pandas
Applying functions to a pandas column to generate multiple new columns can be a common task. However, finding the right approach to ensure efficient and correct assignment can be challenging.
In earlier versions of pandas (pre-v0.16), iterating with df.iterrows() was often used. However, this is significantly slower than more modern approaches. With recent improvements, there are several efficient options available.
One recommended approach uses the zip() function to unpack the outputs of the applied function and assign them to the desired columns. This method works by creating a series of tuples for each row, with each tuple containing the desired output values. The tuples are then unzipped and assigned to the corresponding columns.
<code class="python">df['p1'], df['p2'], df['p3'], df['p4'], df['p5'], df['p6'] = \ zip(*df['num'].map(powers))</code>
The apply() function can also be used, which offers a more direct approach. The applied function should return a pandas DataFrame with the desired number of columns and matching row indices to the input DataFrame.
<code class="python">df = df.apply(lambda x: powers(x['num']), axis=1, result_type='expand')</code>
The assign() function, introduced in pandas v0.16, provides another convenient way to create new columns. It allows the user to directly assign a new column to the DataFrame using an expression.
<code class="python">df = df.assign(p1=df['num'].map(lambda x: x), p2=df['num'].map(lambda x: x**2))</code>
The above is the detailed content of How to Efficiently Create Multiple Columns in a Pandas DataFrame?. For more information, please follow other related articles on the PHP Chinese website!