Constructing Pandas DataFrames from Nested Dictionary Items
Given a nested dictionary with a structure featuring a UserId as the top level, Categories as the second level, and various attributes as the third level, the goal is to create a pandas DataFrame with a hierarchical index. Each UserID should appear as an index value, while Category and attribute values form the column names.
Conventional attempts to construct a DataFrame from such a dictionary may result in incorrect index and column assignment. To address this, consider the following approaches:
1. Reshaping the Dictionary:
One solution involves reshaping the dictionary into a format where keys are tuples representing the desired MultiIndex. This allows the use of pd.DataFrame.from_dict with orient='index':
user_dict = { 12: {'Category 1': {'att_1': 1, 'att_2': 'whatever'}, 'Category 2': {'att_1': 23, 'att_2': 'another'}}, 15: {'Category 1': {'att_1': 10, 'att_2': 'foo'}, 'Category 2': {'att_1': 30, 'att_2': 'bar'}} } df = pd.DataFrame.from_dict({(i,j): user_dict[i][j] for i in user_dict.keys() for j in user_dict[i].keys()}, orient='index')
2. Concatenating DataFrames:
Alternatively, one can build the DataFrame by constructing individual dataframes for each category and user, then concatenating them:
user_ids = [] frames = [] for user_id, d in user_dict.iteritems(): user_ids.append(user_id) frames.append(pd.DataFrame.from_dict(d, orient='index')) df = pd.concat(frames, keys=user_ids)
Both approaches produce a DataFrame with the desired hierarchical index and column structure:
att_1 att_2 12 Category 1 1 whatever Category 2 23 another 15 Category 1 10 foo Category 2 30 bar
The above is the detailed content of How Can I Efficiently Create a Pandas DataFrame from a Nested Dictionary with Hierarchical Data?. For more information, please follow other related articles on the PHP Chinese website!