Obtaining DataFrame Rows Not Present in Another DataFrame
To obtain rows from a DataFrame (df1) that are not present in another DataFrame (df2), the following steps can be executed:
import pandas as pd # Create the two DataFrames. df1 = pd.DataFrame(data={'col1': [1, 2, 3, 4, 5, 3], 'col2': [10, 11, 12, 13, 14, 10]}) df2 = pd.DataFrame(data={'col1': [1, 2, 3], 'col2': [10, 11, 12]}) # Perform a left join, ensuring each row in df1 joins with a single row in df2. df_all = df1.merge(df2.drop_duplicates(), on=['col1', 'col2'], how='left', indicator=True) # Create a boolean condition to identify rows in df1 that are not in df2. condition = df_all['_merge'] == 'left_only' # Filter df1 based on the condition. result = df1[condition]
This approach ensures that only rows in df1 that do not exist in df2 are extracted, taking into account both column values in each row. Alternate solutions that check for individual column values independently may lead to incorrect results.
The above is the detailed content of How to Find Rows in One Pandas DataFrame That Are Not in Another?. For more information, please follow other related articles on the PHP Chinese website!