Replacing Blank Values (Whitespace) with NaN in Pandas
Data cleaning is a crucial step in data analysis. One common task is replacing blank values (whitespace) with NaN. This can be done efficiently using Pandas.
To achieve this, utilize the df.replace() function. This function allows for a regular expression-based search and replace operation on DataFrame values. Here's how you can implement it:
<code class="python">import numpy as np import pandas as pd df = pd.DataFrame([ [-0.532681, 'foo', 0], [1.490752, 'bar', 1], [-1.387326, 'foo', 2], [0.814772, 'baz', ' '], [-0.222552, ' ', 4], [-1.176781, 'qux', ' '], ], columns='A B C'.split(), index=pd.date_range('2000-01-01','2000-01-06')) # Replace fields that contain only whitespace (or are empty) with NaN print(df.replace(r'^\s*$', np.nan, regex=True)) # Output: # A B C # 2000-01-01 -0.532681 foo 0 # 2000-01-02 1.490752 bar 1 # 2000-01-03 -1.387326 foo 2 # 2000-01-04 0.814772 baz NaN # 2000-01-05 -0.222552 NaN 4 # 2000-01-06 -1.176781 qux NaN</code>
Note that this code replaces fields that contain only whitespace or are empty (i.e., match the regular expression r'^s*$'**). If your valid data contains white spaces, adjust the regex accordingly (e.g., remove the **$ from the end for r'^s ').
The above is the detailed content of How do I replace blank values (whitespace) with NaN in a Pandas DataFrame?. For more information, please follow other related articles on the PHP Chinese website!