Pandas Merging 101: The Basics
Introduction
Merging DataFrames in Pandas is a powerful tool for combining and manipulating data from different sources. This guide provides a comprehensive overview of the basic types of joins and their applications.
Types of Joins
1. INNER JOIN (default)
- Matches rows with common keys in both DataFrames.
- Returns only rows that have matching values in both frames.
-
Example:
left.merge(right, on='key')
Copy after login
2. LEFT OUTER JOIN
- Matches rows from the left DataFrame with corresponding rows in the right DataFrame.
- If no matching row is found, NaNs are inserted in the output for the missing columns from the right DataFrame.
-
Example:
left.merge(right, on='key', how='left')
Copy after login
3. RIGHT OUTER JOIN
- Matches rows from the right DataFrame with corresponding rows in the left DataFrame.
- If no matching row is found, NaNs are inserted in the output for the missing columns from the left DataFrame.
-
Example:
left.merge(right, on='key', how='right')
Copy after login
4. FULL OUTER JOIN
- Matches all rows from both DataFrames, regardless of whether they have common keys.
- NaNs are inserted for missing rows in both frames.
-
Example:
left.merge(right, on='key', how='outer')
Copy after login
Other Join Variations
1. LEFT-Excluding JOIN
- Returns rows from the left DataFrame that do not match any rows in the right DataFrame.
2. RIGHT-Excluding JOIN
- Returns rows from the right DataFrame that do not match any rows in the left DataFrame.
3. ANTI JOIN (Excluding on Either Side)
- Returns rows from both DataFrames that do not match any rows on the other side.
Handling Different Key Column Names
- Use left_on and right_on arguments to merge on columns with different names.
Avoiding Duplicate Key Columns in Output
- Set the index as a preliminary step to merge on the index and eliminate the duplicate key column.
Merging Single Column from One DataFrame
- Subset columns before merging to select specific columns from one of the DataFrames.
- Use map for a more efficient approach in cases where only one column is being merged.
Merging on Multiple Columns
- Specify a list for on (or left_on and right_on) to join on multiple columns.
The above is the detailed content of How Do Different Pandas `merge()` Join Types Combine DataFrames?. For more information, please follow other related articles on the PHP Chinese website!