Home > Backend Development > Python Tutorial > How to Efficiently Construct a Pandas DataFrame from a Nested Dictionary with a Hierarchical Index?

How to Efficiently Construct a Pandas DataFrame from a Nested Dictionary with a Hierarchical Index?

DDD
Release: 2024-12-01 09:27:11
Original
837 people have browsed it

How to Efficiently Construct a Pandas DataFrame from a Nested Dictionary with a Hierarchical Index?

Constructing a Pandas DataFrame from Nested Dictionaries

When working with nested dictionaries, it can be challenging to convert the data into a pandas DataFrame in a way that aligns with desired structure. In particular, extracting data from the deepest level of the dictionary as series can be cumbersome.

Suppose you have a dictionary structured as follows:

  • Level 1: UserId (Long Integer)
  • Level 2: Category (String)
  • Level 3: Assorted Attributes (floats, ints, etc.)

The goal is to construct a DataFrame with a hierarchical index using the data from the third level of the dictionary.

Using a MultiIndex

A pandas MultiIndex is a convenient way to represent hierarchical data in a DataFrame. To create a MultiIndex from the nested dictionary, reshape the keys into tuples corresponding to the multi-index values.

user_dict = {12: {'Category 1': {'att_1': 1, 'att_2': 'whatever'},
                  'Category 2': {'att_1': 23, 'att_2': 'another'}},
             15: {'Category 1': {'att_1': 10, 'att_2': 'foo'},
                  'Category 2': {'att_1': 30, 'att_2': 'bar'}}}

df = pd.DataFrame.from_dict({(i,j): user_dict[i][j] 
                           for i in user_dict.keys() 
                           for j in user_dict[i].keys()},
                       orient='index')
Copy after login

This approach will create a DataFrame with a hierarchical index, where the first level contains the UserIds and the second level contains the Categories. The data from the third level is now organized into series accessible using both the UserId and Category as index.

Alternative Approach using Concatenation

Another way to construct the DataFrame is by concatenating component dataframes.

user_ids = []
frames = []

for user_id, d in user_dict.iteritems():
    user_ids.append(user_id)
    frames.append(pd.DataFrame.from_dict(d, orient='index'))

df = pd.concat(frames, keys=user_ids)
Copy after login

This method iterates over the dictionary, creating a DataFrame for each user_id and category combination. The resulting dataframes are then concatenated vertically and joined using keys as the hierarchical index.

The above is the detailed content of How to Efficiently Construct a Pandas DataFrame from a Nested Dictionary with a Hierarchical Index?. For more information, please follow other related articles on the PHP Chinese website!

source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template