Splitting a Large Pandas DataFrame
Consider a large pandas DataFrame consisting of 423244 rows. The need arises to divide this DataFrame into four equal parts. However, an attempt using np.split(df, 4) throws a "ValueError: array split does not result in an equal division" error.
To address this issue, np.array_split should be employed. Unlike np.split, np.array_split allows indices_or_sections to be an integer that does not produce an equal axis division.
<code class="python">import pandas as pd import numpy as np # Create a DataFrame df = pd.DataFrame({'A': ['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'foo'], 'B': ['one', 'one', 'two', 'three', 'two', 'two', 'one', 'three'], 'C': np.random.randn(8), 'D': np.random.randn(8)}) # Split the DataFrame into three equal parts result = np.array_split(df, 3) # Print the results for i in range(len(result)): print(f"Part {i + 1}:") print(result[i]) print()</code>
This code will split the DataFrame into three approximately equal parts. The number of parts can be adjusted as needed.
The above is the detailed content of How to Split a Large Pandas DataFrame into Equal Parts?. For more information, please follow other related articles on the PHP Chinese website!