如何使用Python Scikit-learn實現線性分類？-Python教學-PHP中文網

如何使用Python Scikit-learn實現線性分類？

PHPz

發布： 2023-08-20 18:57:02

轉載

928 人瀏覽過

線性分類是最簡單的機器學習問題之一。為了實現線性分類，我們將使用sklearn的SGD（隨機梯度下降）分類器來預測鳶尾花的品種。

步驟

您可以按照下面給出的步驟使用Python Scikit-learn實現線性分類：

步驟 1 − 先導入必要的套件 scikit-learn，NumPy 和 matplotlib

步驟 2 − 載入資料集並建立訓練和測試資料集。

步驟 3 − 使用matplotlib繪製訓練實例。雖然這一步驟是可選的，但為了更清晰地展示實例，這是一個很好的實踐。

步驟 4 − 建立SGD分類器的對象，初始化其參數並使用fit()方法訓練模型。

步驟 5 − 使用Python Scikit-learn函式庫的度量套件評估結果。

Example

的翻譯為：

範例

讓我們來看下面的範例，我們將使用鳶尾花的兩個特徵，即花萼寬度和花萼長度，來預測鳶尾花的物種。

# Import required libraries
import sklearn
import numpy as np
import matplotlib.pyplot as plt
# %matplotlib inline

# Loading Iris flower dataset
from sklearn import datasets
iris = datasets.load_iris()
X_data, y_data = iris.data, iris.target

# Print iris data shape
print ("Original Dataset Shape:",X_data.shape, y_data.shape)

# Dividing dataset into training and testing dataset and standarized the features
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Getting the Iris dataset with only the first two attributes
X, y = X_data[:,:2], y_data

# Split the dataset into a training and a testing set(20 percent)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=1)
print ("\nTesting Dataset Shape:", X_train.shape, y_train.shape)

# Standarize the features
scaler = StandardScaler().fit(X_train)
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)

# Plot the dataset
# Set the figure size
plt.figure(figsize=(7.16, 3.50))
plt.subplots_adjust(bottom=0.05, top=0.9, left=0.05, right=0.95)
plt.title('Training instances', size ='18')
colors = ['orange', 'green', 'cyan']
for i in range(len(colors)):
   px = X_train[:, 0][y_train == i]
   py = X_train[:, 1][y_train == i]
   plt.scatter(px, py, c=colors[i])
   
plt.legend(iris.target_names)
plt.xlabel('Sepal length')
plt.ylabel('Sepal width')
plt.show()

# create the linear model SGDclassifier
from sklearn.linear_model import SGDClassifier
linear_clf = SGDClassifier()

# Train the classifier using fit() function
linear_clf.fit(X_train, y_train)

# Print the learned coeficients
print ("\nThe coefficients of the linear boundary are:", linear_clf.coef_)
print ("\nThe point of intersection of the line are:",linear_clf.intercept_)

# Evaluate the result
from sklearn import metrics
y_train_pred = linear_clf.predict(X_train)
print ("\nThe Accuracy of our classifier is:", metrics.accuracy_score(y_train, y_train_pred)*100)

登入後複製

輸出

它將產生以下輸出

Original Dataset Shape: (150, 4) (150,)

Testing Dataset Shape: (120, 2) (120,)

The coefficients of the linear boundary are: [[-28.85486061 13.42772422]
[ 2.54806641 -5.04803702]
[ 7.03088805 -0.73391906]]

The point of intersection of the line are: [-19.61738307 -3.54055412 -0.35387805]

登入後複製

我們分類器的準確率為：76.66666666666667

如何使用Python Scikit-learn实现线性分类？