Selecting DataFrame Rows Within a Date Range
In Python, DataFrames are a powerful tool for storing and manipulating tabular data. One commonly encountered task is filtering rows based on a specified date range. To accomplish this, we have the following options:
Method 1: Using a Boolean Mask
If your DataFrame contains a date column with datetime64[ns] data type, you can create a boolean mask to select rows that fall within the desired range:
# Ensure date column is a datetime64 series df['date'] = pd.to_datetime(df['date']) # Create a boolean mask mask = (df['date'] > start_date) & (df['date'] <= end_date) # Select the sub-DataFrame sub_df = df.loc[mask]
Method 2: Setting a DatetimeIndex
Another efficient approach is to set the date column as the index of the DataFrame, creating a DatetimeIndex:
df = df.set_index(['date']) # Select rows using index slicing sub_df = df.loc[start_date:end_date]
This method is particularly useful for frequent date-based selections, as it provides faster performance compared to using a boolean mask.
Example:
Consider the following DataFrame:
>>> df value date 0 0.2 2021-06-01 1 0.3 2021-06-05 2 0.4 2021-06-10 3 0.5 2021-06-15
To select rows for the month of June 2021, we can use either method:
Method 1: Boolean Mask
mask = (df['date'] > '2021-06-01') & (df['date'] <= '2021-06-30') sub_df = df.loc[mask]
Method 2: DatetimeIndex
df = df.set_index(['date']) sub_df = df.loc['2021-06-01':'2021-06-30']
Both methods will return the following sub-DataFrame:
>>> sub_df value date 0 0.2 2021-06-01 1 0.3 2021-06-05 2 0.4 2021-06-10 3 0.5 2021-06-15
The above is the detailed content of How to Efficiently Select DataFrame Rows Within a Specific Date Range in Python?. For more information, please follow other related articles on the PHP Chinese website!