Splitting a DataFrame Based on Column Value in Pandas
Often in data analysis, we encounter situations where we need to divide a DataFrame into multiple DataFrames based on a specific column value. One such case is when we want to split a DataFrame into two parts: one containing rows with values below a certain threshold and another containing rows with values above or equal to that threshold.
In Pandas, we can accomplish this split using boolean indexing. Here's how we can achieve this split with an example:
Consider the following DataFrame with a column named 'Sales':
df = pd.DataFrame({'Sales':[10,20,30,40,50], 'A':[3,4,7,6,1]}) print (df) A Sales 0 3 10 1 4 20 2 7 30 3 6 40 4 1 50
Suppose we want to split this DataFrame into two based on a Sales value of 30:
Splitting with Direct Comparison:
The simplest method is to use direct comparison with the boolean indexing operator '>=':
<code class="python">s = 30 df1 = df[df['Sales'] >= s] print (df1) A Sales 2 7 30 3 6 40 4 1 50</code>
This creates a new DataFrame called df1 that contains all rows where the Sales value is greater than or equal to 30.
Splitting with Inverse Mask:
To create a DataFrame with rows where Sales is less than 30, we can invert the mask using ~:
<code class="python">df2 = df[~mask] print (df2) A Sales 0 3 10 1 4 20</code>
This creates a new DataFrame called df2 that contains all rows where the Sales value is less than 30.
The above is the detailed content of How to Split a DataFrame Based on a Specific Column Value in Pandas?. For more information, please follow other related articles on the PHP Chinese website!