Home > Backend Development > Python Tutorial > How Can I Efficiently Split a Large Pandas DataFrame into Smaller DataFrames Based on Participant IDs?

How Can I Efficiently Split a Large Pandas DataFrame into Smaller DataFrames Based on Participant IDs?

DDD
Release: 2024-12-17 11:09:25
Original
704 people have browsed it

How Can I Efficiently Split a Large Pandas DataFrame into Smaller DataFrames Based on Participant IDs?

Splitting Large Dataframes into Smaller Dataframes

Problem:

You have a massive dataframe with over 1 million records representing data from an experiment with 60 participants. Each participant has a unique code stored in the 'name' variable of the dataframe. You aim to divide the dataframe into 60 smaller dataframes, one for each participant.

Original Attempt:

Your initial approach to achieve this through a custom function called splitframe didn't yield results within an hour of execution. The function intended to loop through the dataframe, iteratively appending rows to smaller dataframes and adding them to a list until a new participant was identified, at which point it would create a new dataframe for the subsequent participant.

Solution using Dataframe Slicing:

Instead of iteratively splitting the dataframe, you can employ a more efficient approach using dataframe slicing. Here's how you can do it:

import pandas as pd

# Create a list of unique participant names
unique_names = data['name'].unique()

# Initialize a dictionary to store the split dataframes
data_dict = {}

# Iterate over the unique names
for name in unique_names:
    # Create a new dataframe by slicing the original dataframe
    data_dict[name] = data[data['name'] == name]
Copy after login

Result:

This code will create a dictionary called data_dict. Each key in the dictionary represents a participant name, and the corresponding value is a pandas dataframe containing all the data for that particular participant. You can access each participant's dataframe by using the following syntax:

participant_data = data_dict['ParticipantName']
Copy after login

The above is the detailed content of How Can I Efficiently Split a Large Pandas DataFrame into Smaller DataFrames Based on Participant IDs?. For more information, please follow other related articles on the PHP Chinese website!

source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template