Home > Backend Development > Python Tutorial > How to merge and join data using DataFrame in Python?

How to merge and join data using DataFrame in Python?

王林
Release: 2023-05-07 21:04:17
forward
3441 people have browsed it

    merge()

    1. Conventional merge

    ①Method 1

    Specify a reference column , based on this column, merge other columns.

    import pandas as pd
    
    df1 = pd.DataFrame({'id': ['001', '002', '003'],
                        'num1': [120, 101, 104],
                        'num2': [110, 102, 121],
                        'num3': [105, 120, 113]})
    df2 = pd.DataFrame({'id': ['001', '002', '003'],
                        'num4': [80, 86, 79]})
    print(df1)
    print("=======================================")
    print(df2)
    print("=======================================")
    df_merge = pd.merge(df1, df2, on='id')
    print(df_merge)
    Copy after login

    How to merge and join data using DataFrame in Python?

    ②Method 2

    To achieve this merge, you can also merge through the index, that is, based on the index column. Just set both left_index and right_index to True
    . (Both left_index and right_index default to False. left_index means that the left table is based on the index of the left table data, and right_index means that the right table is based on the index of the right table data.)

    import pandas as pd
    
    df1 = pd.DataFrame({'id': ['001', '002', '003'],
                        'num1': [120, 101, 104],
                        'num2': [110, 102, 121],
                        'num3': [105, 120, 113]})
    df2 = pd.DataFrame({'id': ['001', '002', '003'],
                        'num4': [80, 86, 79]})
    print(df1)
    print("=======================================")
    print(df2)
    print("=======================================")
    
    df_merge = pd.merge(df1, df2, left_index=True, right_index=True)
    print(df_merge)
    Copy after login

    How to merge and join data using DataFrame in Python?

    Compared with method ①, the difference is that, as shown in the figure, there are duplicate columns in the data merged by method ②.

    Important parameters

    pd.merge(right,how=‘inner’, on=“None”, left_on=“None”, right_on=“None”, left_index= False, right_index=False )

    ParameterDescription
    leftLeft table, merged object, DataFrame or Series
    rightRight table, merged object, DataFrame or Series
    howThe merging method can be left (left merging), right (right merging), outer (outer merging), inner (inner merging)
    onColumn name of the base column
    left_onColumn name of the base column of the left table
    right_onRight table base column column name
    left_indexWhether the left column is based on index, the default is False, no
    right_indexWhether the right column is based on index, the default is False, no

    Among them, left_index and right_index Cannot be specified together with on.

    Merge method left right outer inner

    Prepare data‘

    Prepare a new set of data:

    import pandas as pd
    
    df1 = pd.DataFrame({'id': ['001', '002', '003'],
                        'num1': [120, 101, 104],
                        'num2': [110, 102, 121],
                        'num3': [105, 120, 113]})
    df2 = pd.DataFrame({'id': ['001', '004', '003'],
                        'num4': [80, 86, 79]})
    print(df1)
    print("=======================================")
    print(df2)
    print("=======================================")
    Copy after login

    How to merge and join data using DataFrame in Python?

    inner (default)

    Uses the intersection of keys from both datasets

    df_merge = pd.merge(df1, df2, on='id')
    print(df_merge)
    Copy after login
    Copy after login
    Copy after login

    How to merge and join data using DataFrame in Python?

    outer

    Using the union of keys from both datasets

    df_merge = pd.merge(df1, df2, on='id', how="outer")
    print(df_merge)
    Copy after login

    How to merge and join data using DataFrame in Python?

    left

    Use keys from left data set

    df_merge = pd.merge(df1, df2, on='id', how='left')
    print(df_merge)
    Copy after login

    How to merge and join data using DataFrame in Python?

    right

    Use keys from right data set

    df_merge = pd.merge(df1, df2, on='id', how='right')
    print(df_merge)
    Copy after login

    How to merge and join data using DataFrame in Python?

    2. Many-to-one merge

    import pandas as pd
    
    df1 = pd.DataFrame({'id': ['001', '002', '003'],
                        'num1': [120, 101, 104],
                        'num2': [110, 102, 121],
                        'num3': [105, 120, 113]})
    df2 = pd.DataFrame({'id': ['001', '001', '003'],
                        'num4': [80, 86, 79]})
    print(df1)
    print("=======================================")
    print(df2)
    print("=======================================")
    Copy after login

    How to merge and join data using DataFrame in Python?

    As shown in the figure, there is duplicate id1 data in df2.

    Merge

    df_merge = pd.merge(df1, df2, on='id')
    print(df_merge)
    Copy after login
    Copy after login
    Copy after login

    The merged result is as shown in the figure:

    How to merge and join data using DataFrame in Python?

    Still according to the default Inner method, using the data from the two data sets The intersection of keys. And rows with duplicate keys will be reflected as multiple rows in the merged result.

    3. Many-to-many merge

    For example, there are multiple rows with duplicate IDs in both Chart 1 and Table 2.

    import pandas as pd
    df1 = pd.DataFrame({'id': ['001', '002', '002', '002', '003'],
                        'num1': [120, 101, 104, 114, 123],
                        'num2': [110, 102, 121, 113, 126],
                        'num3': [105, 120, 113, 124, 128]})
    df2 = pd.DataFrame({'id': ['001', '001', '002', '003', '001'],
                        'num4': [80, 86, 79, 88, 93]})
    print(df1)
    print("=======================================")
    print(df2)
    print("=======================================")
    Copy after login

    How to merge and join data using DataFrame in Python?

    df_merge = pd.merge(df1, df2, on='id')
    print(df_merge)
    Copy after login
    Copy after login
    Copy after login

    How to merge and join data using DataFrame in Python?

    concat()

    pd.concat(objs, axis=0, join= ‘outer’, ignore_index:bool=False,keys=None,levels=None,names=None, verify_integrity:bool=False,sort:bool=False,copy:bool=True)

    ParametersDescription
    objsA sequence of Series, DataFrame or Panel objects Or mapping
    axis Default is 0, indicating columns. If 1 it means row.
    #joinThe default is "outer", it can also be "inner"
    ignore_index The default is False, which means the index is retained (not ignored). Set to True to ignore the index.

    其他重要参数通过实例说明。

    1.相同字段的表首位相连

    首先准备三组DataFrame数据:

    import pandas as pd
    df1 = pd.DataFrame({'id': ['001', '002', '003'],
                        'num1': [120, 114, 123],
                        'num2': [110, 102, 121],
                        'num3': [113, 124, 128]})
    df2 = pd.DataFrame({'id': ['004', '005'],
                        'num1': [120, 101],
                        'num2': [113, 126],
                        'num3': [105, 128]})
    df3 = pd.DataFrame({'id': ['007', '008', '009'],
                        'num1': [120, 101, 125],
                        'num2': [113, 126, 163],
                        'num3': [105, 128, 114]})
    
    
    print(df1)
    print("=======================================")
    print(df2)
    print("=======================================")
    print(df3)
    Copy after login

    How to merge and join data using DataFrame in Python?

    合并

    dfs = [df1, df2, df3]
    result = pd.concat(dfs)
    print(result)
    Copy after login

    How to merge and join data using DataFrame in Python?

    如果想要在合并后,标记一下数据都来自于哪张表或者数据的某类别,则也可以给concat加上 参数keys

    result = pd.concat(dfs, keys=['table1', 'table2', 'table3'])
    print(result)
    Copy after login

    How to merge and join data using DataFrame in Python?

    此时,添加的keys与原来的index组成元组,共同成为新的index。

    print(result.index)
    Copy after login

    How to merge and join data using DataFrame in Python?

    2.横向表合并(行对齐)

    准备两组DataFrame数据:

    import pandas as pd
    df1 = pd.DataFrame({'num1': [120, 114, 123],
                        'num2': [110, 102, 121],
                        'num3': [113, 124, 128]}, index=['001', '002', '003'])
    df2 = pd.DataFrame({'num3': [117, 120, 101, 126],
                        'num5': [113, 125, 126, 133],
                        'num6': [105, 130, 128, 128]}, index=['002', '003', '004', '005'])
    
    print(df1)
    print("=======================================")
    print(df2)
    Copy after login

    How to merge and join data using DataFrame in Python?

    当axis为默认值0时:

    result = pd.concat([df1, df2])
    print(result)
    Copy after login

    How to merge and join data using DataFrame in Python?

    横向合并需要将axis设置为1

    result = pd.concat([df1, df2], axis=1)
    print(result)
    Copy after login

    How to merge and join data using DataFrame in Python?

    对比以上输出差异。

    • axis=0时,即默认纵向合并时,如果出现重复的行,则会同时体现在结果中

    • axis=1时,即横向合并时,如果出现重复的列,则会同时体现在结果中。

    3.交叉合并

    result = pd.concat([df1, df2], axis=1, join='inner')
    print(result)
    Copy after login

    How to merge and join data using DataFrame in Python?

    The above is the detailed content of How to merge and join data using DataFrame in Python?. For more information, please follow other related articles on the PHP Chinese website!

    Related labels:
    source:yisu.com
    Statement of this Website
    The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
    Popular Tutorials
    More>
    Latest Downloads
    More>
    Web Effects
    Website Source Code
    Website Materials
    Front End Template