Adding a New Column to an Existing DataFrame
When working with pandas DataFrames, it often becomes necessary to add new columns to existing dataframes. There are multiple approaches to achieve this, each with its own advantages and drawbacks.
1. Using assign (Recommended for Pandas 0.17 and above):
import pandas as pd import numpy as np # Generate a sample DataFrame df1 = pd.DataFrame({ 'a': [0.671399, 0.446172, 0.614758], 'b': [0.101208, -0.243316, 0.075793], 'c': [-0.181532, 0.051767, -0.451460], 'd': [0.241273, 1.577318, -0.012493] }) # Add a new column 'e' with random values sLength = len(df1['a']) df1 = df1.assign(e=pd.Series(np.random.randn(sLength)).values)
2. Using loc[row_index, col_indexer] = value:
# Add a new column 'f' using loc df1.loc[:, 'f'] = pd.Series(np.random.randn(sLength), index=df1.index)
3. Using df[new_column_name] = pd.Series(values, index=df.index):
# Add a new column 'g' using the old method df1['g'] = pd.Series(np.random.randn(sLength), index=df1.index)
Remember that the latter method may trigger the SettingWithCopyWarning in newer versions of pandas. Using assign or loc is generally recommended for efficiency and clarity.
The above is the detailed content of How Can I Efficiently Add a New Column to a Pandas DataFrame?. For more information, please follow other related articles on the PHP Chinese website!