Creating DataFrames from Dictionaries with Varying Array Lengths
When creating a DataFrame from a dictionary where the values are numpy arrays, subsequent errors may arise if the arrays do not have the same length. This is because Pandas requires consistent array lengths for each column.
To overcome this, Pandas allows for missing values (NaN) to fill in the shorter arrays. This enables the creation of DataFrames with columns having different lengths.
Python 2.x:
import pandas as pd import numpy as np d = dict( A = np.array([1,2]), B = np.array([1,2,3,4]) ) pd.DataFrame(dict([ (k,pd.Series(v)) for k,v in d.iteritems() ]))
Python 3.x:
import pandas as pd import numpy as np d = dict( A = np.array([1,2]), B = np.array([1,2,3,4]) ) pd.DataFrame(dict([ (k,pd.Series(v)) for k,v in d.items() ]))
In both cases, the resulting DataFrame will have columns A and B, where A contains the first two values of the respective array in the dictionary, and B contains all four values. The shorter array (A) is padded with NaN for the missing values.
Output:
A B 0 1 1 1 2 2 2 NaN 3 3 NaN 4
The above is the detailed content of How to Create Pandas DataFrames from Dictionaries with Varying Array Lengths?. For more information, please follow other related articles on the PHP Chinese website!