Splitting a Pandas DataFrame by a Column Value
Consider a scenario where you have a DataFrame with a column named 'Sales'. You want to segregate this DataFrame into two based on the values in the 'Sales' column, such that the first DataFrame contains data where 'Sales' is less than a specified threshold, while the second DataFrame includes data where 'Sales' is greater than or equal to the threshold.
To achieve this, you can leverage boolean indexing in Pandas. Here's an example:
<code class="python">import pandas as pd # Create a sample DataFrame df = pd.DataFrame({'Sales': [10, 20, 30, 40, 50], 'A': [3, 4, 7, 6, 1]}) print(df) # Set the threshold (s) s = 30 # Split the DataFrame based on the 'Sales' column df1 = df[df['Sales'] >= s] print(df1) df2 = df[df['Sales'] < s] print(df2)
Output:
A Sales 0 3 10 1 4 20 2 7 30 3 6 40 4 1 50 A Sales 2 7 30 3 6 40 4 1 50 A Sales 0 3 10 1 4 20
Alternatively, you can use the inverse mask operator (~) to achieve the same result:
<code class="python">mask = df['Sales'] >= s df1 = df[mask] df2 = df[~mask] print(df1) print(df2)</code>
This will have the same effect as the previous example.
The above is the detailed content of How to Split a Pandas DataFrame Based on a Column Value Threshold?. For more information, please follow other related articles on the PHP Chinese website!