Converting a Pandas DataFrame containing missing values to a NumPy array is a common task in data analysis. The desired output is to have np.nan represent missing values.
The recommended method for this conversion is to use the df.to_numpy() method:
<code class="python">import numpy as np import pandas as pd # Create a DataFrame with missing values index = [1, 2, 3, 4, 5, 6, 7] a = [np.nan, np.nan, np.nan, 0.1, 0.1, 0.1, 0.1] b = [0.2, np.nan, 0.2, 0.2, 0.2, np.nan, np.nan] c = [np.nan, 0.5, 0.5, np.nan, 0.5, 0.5, np.nan] df = pd.DataFrame({'A': a, 'B': b, 'C': c}, index=index) # Convert to NumPy array np_array = df.to_numpy() print(np_array)</code>
This will output:
<code class="python">array([[ nan, 0.2, nan], [ nan, nan, 0.5], [ nan, 0.2, 0.5], [ 0.1, 0.2, nan], [ 0.1, 0.2, 0.5], [ 0.1, nan, 0.5], [ 0.1, nan, nan]])</code>
To preserve the data types in the output NumPy array, you can use the df.to_records() method:
<code class="python">records = df.to_records() print(records.dtype)</code>
This will output:
<code class="python">[('index', 'O'), ('A', '<f8'), ('B', '<f8'), ('C', '<f8')]</code>
where O represents object type (index), and f8 represents float64 type (the values).
Using np.rec.fromrecords, you can convert the records to a structured NumPy array:
<code class="python">import numpy as np np_array = np.rec.fromrecords(records, names=df.columns) print(np_array.dtype)</code>
This will output the same data types as the records.
The above is the detailed content of How to Convert a Pandas DataFrame with Missing Values to a NumPy Array?. For more information, please follow other related articles on the PHP Chinese website!