pandas: Multiple Conditions While Indexing a Data Frame - Non-Intuitive Behavior
When selecting rows from a data frame based on conditions involving multiple columns, users might encounter unexpected behavior. In particular, the OR and AND operators seem to behave conversely to their expected roles.
Consider the following code:
<code class="python">import pandas as pd df = pd.DataFrame({'a': range(5), 'b': range(5) }) # Insert -1 values df.loc[1, 'a'] = -1 df.loc[1, 'b'] = -1 df.loc[3, 'a'] = -1 df.loc[4, 'b'] = -1 df1 = df[(df.a != -1) & (df.b != -1)] df2 = df[(df.a != -1) | (df.b != -1)] df_combined = pd.concat([df, df1, df2], axis=1, keys=['Original', 'AND', 'OR']) print(df_combined)</code>
Results:
<code class="python"> Original AND OR a b a b a b 0 0 0 0 0 0 0 1 -1 -1 NaN NaN NaN NaN 2 2 2 2 2 2 2 3 -1 3 NaN NaN -1 3 4 4 -1 NaN NaN 4 -1</code>
As observed, rows where one or both values are -1 are retained when the OR operator is used (df2), while rows with any -1 value are discarded when the AND operator is used (df1). This behavior contradicts intuitive expectations.
Explanation
The seemingly reversed behavior stems from the perspective adopted in each operator's condition. For the AND operator:
<code class="python">(df.a != -1) & (df.b != -1)</code>
The condition reads as "keep rows where both df.a and df.b differ from -1," effectively excluding rows with at least one -1 value.
Conversely, the OR operator:
<code class="python">(df.a != -1) | (df.b != -1)</code>
Reads as "keep rows where either df.a or df.b differs from -1," effectively excluding rows where both values are -1.
Thus, the behavior aligns with the intention of selecting rows to retain, rather than those to exclude.
Note on Chained Access
The code snippet df['a'][1] = -1 for modifying cell values is not advisable. For clarity and consistency, it is recommended to use df.loc[1, 'a'] = -1 or df.iloc[1, 0] = -1 instead.
The above is the detailed content of Why does using the OR operator in pandas indexing retain rows with -1 values, while the AND operator discards them, contradicting intuitive expectations?. For more information, please follow other related articles on the PHP Chinese website!