pandas: Filtering Data Frame with Multiple Conditions
In pandas, filtering data frames by values in multiple columns can be tricky. When using the AND operator (&), you might expect it to behave like the OR operator (|), and vice versa.
Consider the following test code:
<code class="python">df = pd.DataFrame({'a': range(5), 'b': range(5) }) df['a'][1] = -1 df['b'][1] = -1 df['a'][3] = -1 df['b'][4] = -1 df1 = df[(df.a != -1) & (df.b != -1)] df2 = df[(df.a != -1) | (df.b != -1)] print(pd.concat([df, df1, df2], axis=1, keys=[ 'original df', 'using AND (&)', 'using OR (|)',]))</code>
The unexpected behavior occurs in the results:
original df using AND (&) using OR (|) a b a b a b 0 0 0 0 0 0 0 1 -1 -1 NaN NaN NaN NaN 2 2 2 2 2 2 2 3 -1 3 NaN NaN -1 3 4 4 -1 NaN NaN 4 -1 [5 rows x 6 columns]
The AND operator (&) drops every row where at least one value is -1, while the OR operator (|) drops only rows where both values are -1. This behavior is the opposite of what is expected.
The reason for this behavior lies in the way these operators are used. In the AND condition, you are specifying to keep rows where both conditions are true, which is equivalent to dropping rows where at least one condition is false. In contrast, the OR condition specifies to keep rows where either condition is true, which is equivalent to dropping rows where both conditions are false.
To ensure clarity and avoid confusion, it is recommended to use explicit notation for conditions involving multiple columns. Instead of chaining multiple conditions with operators, use parentheses to group conditions and make their logical relationship explicit.
For example, the following code explicitly specifies the AND conditions:
<code class="python">df1 = df[(df.a != -1) & (df.b != -1)]</code>
While the following code explicitly specifies the OR conditions:
<code class="python">df2 = df[(df.a != -1) | (df.b != -1)]</code>
By using explicit notation, you can ensure that your conditions are interpreted as intended and prevent unexpected behavior.
The above is the detailed content of Why does the AND operator (&) in pandas behave like the OR operator (|) when filtering data frames by multiple conditions?. For more information, please follow other related articles on the PHP Chinese website!