Home > Backend Development > Python Tutorial > How Do Pandas DataFrames Merge Using Different Join Types?

How Do Pandas DataFrames Merge Using Different Join Types?

Mary-Kate Olsen
Release: 2024-12-27 13:17:11
Original
344 people have browsed it

How Do Pandas DataFrames Merge Using Different Join Types?

Pandas Merging 101

Understanding Merging

Merging combines two or more DataFrames based on shared keys to create a new DataFrame. Pandas provides various types of merges, including INNER, LEFT, RIGHT, and FULL OUTER joins.

Basic Join Types

a. INNER JOIN

  • Combines rows that share common keys in both DataFrames.
  • Example:

    left = pd.DataFrame({'key': ['A', 'B', 'C', 'D'], 'value': np.random.randn(4)})
    right = pd.DataFrame({'key': ['B', 'D', 'E', 'F'], 'value': np.random.randn(4)})
    left.merge(right, on='key')
    Copy after login

b. LEFT OUTER JOIN

  • Retains all rows from the left DataFrame, adding NaN values for missing keys in the right DataFrame.
  • Example:

    left.merge(right, on='key', how='left')
    Copy after login

c. RIGHT OUTER JOIN

  • Retains all rows from the right DataFrame, adding NaN values for missing keys in the left DataFrame.
  • Example:

    left.merge(right, on='key', how='right')
    Copy after login

d. FULL OUTER JOIN

  • Combines all rows from both DataFrames, adding NaN values for missing keys.
  • Example:

    left.merge(right, on='key', how='outer')
    Copy after login

Excluding Data with Left/Right Excluding Joins

If you need to exclude specific rows, you can perform a Left-Excluding or Right-Excluding JOIN by first performing a LEFT/RIGHT OUTER JOIN and filtering to exclude rows from the other DataFrame.

e. Left-Excluding JOIN

  • Excludes rows from the right DataFrame present in the left DataFrame.
  • Example:

    (left.merge(right, on='key', how='left', indicator=True)
     .query('_merge == "left_only"')
     .drop('_merge', 1))
    Copy after login

f. Right-Excluding JOIN

  • Excludes rows from the left DataFrame present in the right DataFrame.
  • Example:

    (left.merge(right, on='key', how='right', indicator=True)
     .query('_merge == "right_only"')
     .drop('_merge', 1))
    Copy after login

g. ANTI JOIN

  • Combines rows that are not present in both DataFrames.
  • Example:

    (left.merge(right, on='key', how='outer', indicator=True)
     .query('_merge != "both"')
     .drop('_merge', 1))
    Copy after login

Handling Duplicate Key Columns

To avoid duplicate key columns in the output, you can set appropriate indices as keys before merging:

left3 = left2.set_index('keyLeft')
left3.merge(right2, left_index=True, right_on='keyRight')
Copy after login

Merging on Multiple Columns

To join on multiple columns, specify a list for on (or left_on and right_on, as appropriate).

left.merge(right, on=['key1', 'key2'] ...)
Copy after login

Additional Merge Functions

  • pd.merge_ordered: For ordered JOINs.
  • pd.merge_asof: For approximate joins.

Refer to the documentation on merge, join, and concat for more specific examples and cases.

The above is the detailed content of How Do Pandas DataFrames Merge Using Different Join Types?. For more information, please follow other related articles on the PHP Chinese website!

source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Latest Articles by Author
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template