How to Efficiently Merge Pandas DataFrames with Date Range Conditions?

Barbara Streisand
Release: 2024-11-02 13:49:02
Original
952 people have browsed it

How to Efficiently Merge Pandas DataFrames with Date Range Conditions?

Merging Pandas Dataframes with Date Range Conditions

To merge two pandas dataframes where one value falls between two others, a common approach involves merging the dataframes unconditionally on an identifier and subsequently filtering based on the date condition. However, this method can lead to memory inefficiencies.

An alternative solution that leverages the power of SQL is to perform the merge and filtering within an SQL query, as demonstrated below:

<code class="python">import pandas as pd
import sqlite3

# Create dataframes
presidents = pd.DataFrame({"name": ["Bush", "Obama", "Trump"],
                           "president_id":[43, 44, 45]})
terms = pd.DataFrame({'start_date': pd.date_range('2001-01-20', periods=5, freq='48M'),
                      'end_date': pd.date_range('2005-01-21', periods=5, freq='48M'),
                      'president_id': [43, 43, 44, 44, 45]})
war_declarations = pd.DataFrame({"date": [datetime(2001, 9, 14), datetime(2003, 3, 3)],
                                 "name": ["War in Afghanistan", "Iraq War"]})

# Create an in-memory database
conn = sqlite3.connect(':memory:')

# Write dataframes to database
terms.to_sql('terms', conn, index=False)
presidents.to_sql('presidents', conn, index=False)
war_declarations.to_sql('wars', conn, index=False)

# Execute SQL query to merge and filter dataframes
qry = '''
    select  
        start_date PresTermStart,
        end_date PresTermEnd,
        wars.date WarStart,
        presidents.name Pres
    from
        terms join wars on
        date between start_date and end_date join presidents on
        terms.president_id = presidents.president_id
    '''
df = pd.read_sql_query(qry, conn)

print(df)</code>
Copy after login

Output:

         PresTermStart          PresTermEnd             WarStart  Pres
0  2001-01-31 00:00:00  2005-01-31 00:00:00  2001-09-14 00:00:00  Bush
1  2001-01-31 00:00:00  2005-01-31 00:00:00  2003-03-03 00:00:00  Bush
Copy after login

By performing the merge and filtering within SQL, we can avoid creating a potentially large intermediate dataframe, resulting in improved memory efficiency.

The above is the detailed content of How to Efficiently Merge Pandas DataFrames with Date Range Conditions?. For more information, please follow other related articles on the PHP Chinese website!

source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Latest Articles by Author
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template
About us Disclaimer Sitemap
php.cn:Public welfare online PHP training,Help PHP learners grow quickly!