Replace NaN Values with Column Averages in a pandas DataFrame
In a pandas DataFrame, NaN values can arise, necessitating the replacement with appropriate values for data analysis. This article addresses the challenge of replacing NaNs with the average of each corresponding column.
Unlike a numpy array, a pandas DataFrame cannot directly apply the averaging technique used for a numpy array. Instead, the DataFrame.fillna method provides a straightforward solution.
Using DataFrame.fillna
To fill NaN values with the column mean, use the following code:
<code class="python">import pandas as pd # Create a DataFrame with NaN values df = pd.DataFrame({ 'A': [-0.166919, -0.297953, -0.120211, np.nan, np.nan, -0.788073, -0.916080, -0.887858, 1.948430, 0.019698], 'B': [0.979728, -0.912674, -0.540679, -2.027325, np.nan, np.nan, -0.612343, 1.033826, 1.025011, -0.795876], 'C': [-0.632955, -1.365463, -0.680481, 1.533582, 0.461821, np.nan, np.nan, np.nan, -2.982224, -0.046431] }) print("Original DataFrame with NaN values:") print(df) # Calculate column means column_means = df.mean() print("\nColumn means:") print(column_means) # Replace NaN values with column means df_filled = df.fillna(column_means) print("\nDataFrame with NaN values replaced by column means:") print(df_filled)</code>
Example:
Consider the following DataFrame with NaN values:
A B C 0 -0.166919 0.979728 -0.632955 1 -0.297953 -0.912674 -1.365463 2 -0.120211 -0.540679 -0.680481 3 NaN -2.027325 1.533582 4 NaN NaN 0.461821 5 -0.788073 NaN NaN 6 -0.916080 -0.612343 NaN 7 -0.887858 1.033826 NaN 8 1.948430 1.025011 -2.982224 9 0.019698 -0.795876 -0.046431
Using DataFrame.fillna, the NaN values are replaced with the column means:
A B C 0 -0.166919 0.979728 -0.632955 1 -0.297953 -0.912674 -1.365463 2 -0.120211 -0.540679 -0.680481 3 -0.151121 -2.027325 1.533582 4 -0.151121 -0.231291 0.461821 5 -0.788073 -0.231291 -0.530307 6 -0.916080 -0.612343 -0.530307 7 -0.887858 1.033826 -0.530307 8 1.948430 1.025011 -2.982224 9 0.019698 -0.795876 -0.046431
Therefore, the NaN values have been replaced with the appropriate column averages.
The above is the detailed content of How do you replace NaN values in a pandas DataFrame with the average of each corresponding column?. For more information, please follow other related articles on the PHP Chinese website!