How do I split a large Pandas DataFrame into equal parts when the number of rows is not divisible by the number of parts?

Mary-Kate Olsen
Release: 2024-10-28 03:29:30
Original
859 people have browsed it

How do I split a large Pandas DataFrame into equal parts when the number of rows is not divisible by the number of parts?

Splitting Large Pandas Dataframes into Equal Parts

When working with large datasets in Pandas, it is often necessary to divide them into smaller chunks for processing or analysis. One commonly used method for splitting dataframes is np.split, which distributes the data into an equal number of arrays along a specified axis. However, attempting to split an uneven number of rows using this method can result in a ValueError.

Alternative Approach Using np.array_split

To overcome this issue, consider using np.array_split instead. This function allows for unequal division of the dataframe, as demonstrated in the following Python code:

<code class="python">import pandas as pd
import numpy as np

df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'foo'],
                    'B' : ['one', 'one', 'two', 'three', 'two', 'two', 'one', 'three'],
                    'C' : np.random.randn(8), 'D' : np.random.randn(8)})

print(df)

split_data = np.array_split(df, 4)

for part in split_data:
    print(part)</code>
Copy after login

The output of this code shows the dataframe split into four equal parts:

     A      B         C         D
0  foo    one -0.174067 -0.608579
1  bar    one -0.860386 -1.210518
2  foo    two  0.614102  1.689837
3  bar  three -0.284792 -1.071160
4  foo    two  0.843610  0.803712
5  bar    two -1.514722  0.870861
6  foo    one  0.131529 -0.968151
7  foo  three -1.002946 -0.257468

     A      B         C         D
0  foo    one -0.174067 -0.608579
1  bar    one -0.860386 -1.210518
2  foo    two  0.614102  1.689837
3  bar  three -0.284792 -1.071160
4  foo    two  0.843610  0.803712
5  bar    two -1.514722  0.870861

     A      B         C         D
0  foo    one  0.131529 -0.968151
1  foo  three -1.002946 -0.257468

     A      B         C         D
0  bar    one -0.860386 -1.210518
1  foo    two  0.614102  1.689837
2  bar  three -0.284792 -1.071160
3  foo    two  0.843610  0.803712
4  bar    two -1.514722  0.870861
Copy after login

Using np.array_split ensures an even distribution of the dataframe rows, regardless of their total count. This provides a convenient method for splitting large datasets into manageable chunks for further processing.

The above is the detailed content of How do I split a large Pandas DataFrame into equal parts when the number of rows is not divisible by the number of parts?. For more information, please follow other related articles on the PHP Chinese website!

source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Latest Articles by Author
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template
About us Disclaimer Sitemap
php.cn:Public welfare online PHP training,Help PHP learners grow quickly!