Python is currently one of the most popular programming languages. Its powerful and flexible features make it the language of choice in the field of data science and machine learning. In data analysis, time series is a very important concept because it can be used to describe time-ordered data, such as stock prices, weather changes, etc.
In this article, we will explore how to classify time series data using Python.
First, we need to prepare the data for classification. In this example, we will use a dataset from the UCI Machine Learning Repository, which contains a 1000-day time series, each consisting of 24 hours of meteorological data. This dataset aims to predict whether the next day's minimum temperature will fall below a certain threshold.
We will use the pandas library to load the dataset.
import pandas as pd # 加载数据集 data = pd.read_csv("weather.csv") # 查看前几行数据 print(data.head())
Output:
Date R1 R2 R3 R4 R5 R6 R7 R8 R9 ... R15 R16 R17 R18 R19 R20 R21 R22 R23 R24 Tmin 0 1/01/14 58 41 67 63 44 50 46 52 64 ... 82 83 62 49 67 73 65 52 39 23 42 1 2/01/14 46 45 36 63 72 75 80 65 68 ... 74 73 52 43 36 47 19 16 13 15 26 2 3/01/14 48 37 39 45 74 75 76 66 45 ... 76 62 49 50 38 50 29 15 13 15 30 3 4/01/14 46 43 47 76 48 68 77 61 61 ... 24 28 39 33 26 3 4 6 0 10 50 4 5/01/14 49 42 58 74 70 47 68 59 43 ... 55 37 36 42 30 29 35 31 25 22 32
As we can see, the data set contains information such as date, 24 hours of weather data, and minimum temperature (Tmin).
Before classification, we need to preprocess the data. One of the steps is feature engineering, where we need to extract new features from the original data to improve the performance of the model.
We can extract the following features from the time series:
We can use pandas to quickly extract these features.
# 提取以下特征 features = [] for i in range(1, 25): features.append("R"+str(i)) data['Mean'] = data[features].mean(axis=1) data['Std'] = data[features].std(axis=1) data['Min'] = data[features].min(axis=1) data['Max'] = data[features].max(axis=1) data['Median'] = data[features].median(axis=1) data['Var'] = data[features].var(axis=1) # 查看更新后的数据集 print(data.head())
Output:
Date R1 R2 R3 R4 R5 R6 R7 R8 R9 ... R18 R19 R20 R21 R22 R23 R24 Tmin Mean Std Min Max Median Var 0 1/01/14 58 41 67 63 44 50 46 52 64 ... 49 67 73 65 52 39 23 42 55.166667 15.181057 23 83 54.5 230.456140 1 2/01/14 46 45 36 63 72 75 80 65 68 ... 43 36 47 19 16 13 15 26 47.125000 20.236742 13 80 45.5 410.114035 2 3/01/14 48 37 39 45 74 75 76 66 45 ... 50 38 50 29 15 13 15 30 47.208333 19.541905 13 76 44.5 382.149123 3 4/01/14 46 43 47 76 48 68 77 61 61 ... 33 26 3 4 6 0 10 50 36.750000 19.767969 0 77 42.5 390.350877 4 5/01/14 49 42 58 74 70 47 68 59 43 ... 42 30 29 35 31 25 22 32 45.666667 16.013175 22 74 43.5 256.508772
Now, we have successfully extracted some new features from the time series, which will provide more information for our classifier.
Next, we need to divide the data set into a training set and a test set. We will use the scikit-learn library to accomplish this task.
from sklearn.model_selection import train_test_split X = data.drop(['Date','Tmin'], axis=1) y = data['Tmin'] X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
Here we divide the data set into 80% training set and 20% test set.
Now, we are ready to classify the data using a time series classifier. In this example we will use the LightGBM model.
import lightgbm as lgb # 创建LightGBM分类器 clf = lgb.LGBMClassifier() # 训练模型 clf.fit(X_train, y_train) # 在测试集上进行预测 y_pred = clf.predict(X_test) # 计算精度 accuracy = sum(y_pred == y_test) / len(y_test) print("Accuracy: {:.2f}%".format(accuracy * 100))
Output:
Accuracy: 94.50%
We got 94.5% accuracy, which means our model predicted very accurately whether the minimum temperature is below the predefined threshold.
In Python, classifying time series data becomes very easy using a time series classifier. In this article, we use the LightGBM model to classify time series data, and use the pandas library to preprocess the data and extract features.
Whether you are working in stock price forecasting, weather change prediction, or other time series tasks, these tools and techniques can help you better perform data analysis and forecasting.
The above is the detailed content of Time series classification examples in Python. For more information, please follow other related articles on the PHP Chinese website!