Do not change the meaning of the original content, fine-tune the content, rewrite the content, and do not continue writing. "Quantile regression meets this need, providing prediction intervals with quantified chances. It is a statistical technique used to model the relationship between a predictor variable and a response variable, especially when the conditional distribution of the response variable is of interest When. Unlike traditional regression methods, quantile regression focuses on estimating the conditional magnitude of the response variable rather than the conditional mean.”
##Figure (A): Quantile. Regression
Concept of Quantile Regression Quantile regression is a modeling method that estimates the linear relationship between a set of regressors X and the quantiles of the explained variable Y . The existing regression model is actually a method of studying the relationship between the explained variable and the explanatory variable. They focus on the relationship between the explanatory variables and the explained variables and their error distribution. Median regression and quantile regression are two common regression models. They were first proposed according to Koenker and Bassett (1978). The calculation of the ordinary least squares regression estimator is based on minimizing the sum of squares of the residuals. The calculation of the quantile regression estimator is also based on minimizing the absolute value residual in a symmetric form. Among them, the median regression operation is the least absolute deviations estimator (LAD, least absolute deviations estimator). Advantages of Quantile Regression Explaining the full picture of the conditional distribution of the explained variable is not only analyzing the conditional expectation (mean) of the explained variable, but also analyzing how the explanatory variable affects the explained variable Median, quantiles, etc. of variables. The regression coefficient estimates at different quantiles are often different, that is, the explanatory variables have different effects on different quantiles. Therefore, the different effects of different quantiles of the explanatory variables will have different effects on the explained variables. Compared with the least multiplication method, the estimation method for median regression is more robust to outliers, and quantile regression does not require strong assumptions on the error term, so For non-normal distributions, the median regression coefficient is healthier. At the same time, the quantile regression system quantity estimation becomes more robust. What are the advantages of quantile regression over Monte Carlo simulation? First, quantile regression directly estimates the conditional magnitude of the response variable given the predictors. This means that, rather than producing a large number of possible outcomes like a Monte Carlo simulation, it provides an estimate of a specific magnitude of the distribution of the response variable. This is particularly useful for understanding different levels of forecast uncertainty, such as quintiles, quartiles, or extreme magnitudes. Second, quantile regression provides a model-based prediction uncertainty estimation method that uses observation data to estimate the relationship between variables and make predictions based on this relationship. In contrast, Monte Carlo simulation relies on specifying probability distributions for input variables and generating results based on random sampling. NeuralProphet provides two statistical techniques: (1) quantile regression and (2) conformal quantile regression. The conformal quantile prediction technique adds a calibration process to do quantile regression. In this article, we will use Neural Prophet's quantile regression module to make quantile regression predictions. This module adds a calibration process to ensure that the prediction results are consistent with the distribution of the observed data. We will use Neural Prophet’s quantile regression module in this chapter. Environmental requirementsInstall NeuralProphet.!pip install neuralprophet!pip uninstall numpy!pip install git+https://github.com/ourownstory/neural_prophet.git numpy==1.23.5
%matplotlib inlinefrom matplotlib import pyplot as pltimport pandas as pdimport numpy as npimport loggingimport warningslogging.getLogger('prophet').setLevel(logging.ERROR)warnings.filterwarnings("ignore")
data = pd.read_csv('/bike_sharing_daily.csv')data.tail()
Picture (B): Shared bicycles
Plot the number of shared bicycles. We observed that demand increased in the second year and followed a seasonal pattern.# convert string to datetime64data["ds"] = pd.to_datetime(data["dteday"])# create line plot of sales dataplt.plot(data['ds'], data["cnt"])plt.xlabel("date")plt.ylabel("Count")plt.show()
Figure (C): Daily demand for bicycle rental
Make the most basic data preparation for modeling. NeuralProphet requires the column names ds and y, which is the same as Prophet.df = data[['ds','cnt']]df.columns = ['ds','y']
直接在 NeuralProphet 中构建分位数回归。假设我们需要第 5、10、50、90 和 95 个量级的值。我们指定 quantile_list = [0.05,0.1,0.5,0.9,0.95],并打开参数 quantiles = quantile_list。
from neuralprophet import NeuralProphet, set_log_levelquantile_list=[0.05,0.1,0.5,0.9,0.95 ]# Model and predictionm = NeuralProphet(quantiles=quantile_list,yearly_seasnotallow=True,weekly_seasnotallow=True,daily_seasnotallow=False)m = m.add_country_holidays("US")m.set_plotting_backend("matplotlib")# Use matplotlibdf_train, df_test = m.split_df(df, valid_p=0.2)metrics = m.fit(df_train, validation_df=df_test, progress="bar")metrics.tail()
我们将使用 .make_future_dataframe()为预测创建新数据帧,NeuralProphet 是基于 Prophet 的。参数 n_historic_predictions 为 100,只包含过去的 100 个数据点。如果设置为 True,则包括整个历史数据。我们设置 period=50 来预测未来 50 个数据点。
future = m.make_future_dataframe(df, periods=50, n_historic_predictinotallow=100) #, n_historic_predictinotallow=1)# Perform prediction with the trained modelsforecast = m.predict(df=future)forecast.tail(60)
预测结果存储在数据框架 predict 中。
图 (D):预测
上述数据框架包含了绘制地图所需的所有数据元素。
m.plot(forecast, plotting_backend="plotly-static"#plotting_backend = "matplotlib")
预测区间是由分位数值提供的!
图 (E):分位数预测
预测区间和置信区间在流行趋势中很有帮助,因为它们可以量化不确定性。它们的目标、计算方法和应用是不同的。下面我将用回归来解释两者的区别。在图(F)中,我在左边画出了线性回归,在右边画出了分位数回归。
图(F):置信区间与预测区间的区别
首先,它们的目标不同:
其次,它们的计算方法不同:
第三,它们的应用不同:
本文介绍了分位数回归预测区间的概念,以及如何利用 NeuralProphet 生成预测区间。我们还强调了预测区间和置信区间之间的差异,这在商业应用中经常引起混淆。后面将继续探讨另一项重要的技术,即复合分位数回归(CQR),用于预测不确定性。
The above is the detailed content of Quantile regression for time series probabilistic forecasting. For more information, please follow other related articles on the PHP Chinese website!