In pandas DataFrames, handling missing data is crucial for accurate analysis. When encountered with incomplete data, replacing NaN values with meaningful estimates becomes necessary. This article demonstrates how to replace NaN values with the average of their respective columns in a pandas DataFrame.
Consider a DataFrame with a mixture of real numbers and NaN values. The goal is to replace the NaN values with the average values of the columns in which they appear.
Unlike in NumPy arrays, filling NaN values in pandas DataFrames can be efficiently handled using the fillna method:
<code class="python">df.fillna(df.mean())</code>
This method fills NaN values with the mean of the corresponding column. For example:
<code class="python">df = pd.DataFrame({'A': [-0.166919, -0.297953, -0.120211, np.nan, np.nan, -0.788073, -0.916080, -0.887858, 1.948430, 0.019698], 'B': [0.979728, -0.912674, -0.540679, -2.027325, np.nan, np.nan, -0.612343, 1.033826, 1.025011, -0.795876], 'C': [-0.632955, -1.365463, -0.680481, 1.533582, 0.461821, np.nan, np.nan, np.nan, -2.982224, -0.046431]}) mean = df.mean() print(df.fillna(mean))</code>
Output:
A B C 0 -0.166919 0.979728 -0.632955 1 -0.297953 -0.912674 -1.365463 2 -0.120211 -0.540679 -0.680481 3 -0.151121 -2.027325 1.533582 4 -0.151121 -0.231291 0.461821 5 -0.788073 -0.231291 -0.530307 6 -0.916080 -0.612343 -0.530307 7 -0.887858 1.033826 -0.530307 8 1.948430 1.025011 -2.982224 9 0.019698 -0.795876 -0.046431
The NaN values have been replaced with the average values of their respective columns.
The above is the detailed content of How to Replace NaN Values in a Pandas DataFrame with Column Averages?. For more information, please follow other related articles on the PHP Chinese website!