The Support Vector Machine (SVM) in Python is a powerful supervised learning algorithm that can be used to solve classification and regression problems. SVM performs well when dealing with high-dimensional data and non-linear problems, and is widely used in data mining, image classification, text classification, bioinformatics and other fields.
In this article, we will introduce an example of using SVM for classification in Python. We will use the SVM model from the scikit-learn library, which provides many powerful machine learning algorithms.
First, we need to install the scikit-learn library, which can be installed using the following command in the terminal:
pip install scikit-learn
Next, we will use the classic Iris data set to demonstrate the classification effect of SVM. The Iris data set contains 150 samples, divided into three categories, each category contains 50 samples. Each sample has 4 characteristics: sepal length, sepal width, petal length and petal width. We will use SVM to classify these samples.
First, we need to import the required libraries:
from sklearn import datasets from sklearn.model_selection import train_test_split from sklearn import svm from sklearn.metrics import accuracy_score
Next, we load the Iris data set:
iris = datasets.load_iris()
Then, we divide the data into a training set and a test set:
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.3, random_state=0)
Here, we use the train_test_split function to randomly divide the data set into a training set and a test set. The test_size parameter specifies that the test set accounts for 30% of the total data set.
Next, we will use the SVM model to fit the training set:
clf = svm.SVC(kernel='linear', C=1) clf.fit(X_train, y_train)
Here, we use the linear kernel function and specify a regularization parameter C=1. The hyperparameter C of SVM controls the trade-off between accuracy and complexity of the model. The smaller the C value, the simpler the model is and is prone to under-fitting; the larger the C value is, the more complex the model is and is prone to over-fitting. Usually, we need to choose an appropriate C value through cross-validation.
Next, we use the trained model to predict the test set:
y_pred = clf.predict(X_test)
Finally, we can use the accuracy_score function to calculate the classification accuracy:
accuracy = accuracy_score(y_test, y_pred) print('Accuracy:', accuracy)
Complete The code is as follows:
from sklearn import datasets from sklearn.model_selection import train_test_split from sklearn import svm from sklearn.metrics import accuracy_score # Load iris dataset iris = datasets.load_iris() # Split data into train and test X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.3, random_state=0) # Fit SVM model on training data clf = svm.SVC(kernel='linear', C=1) clf.fit(X_train, y_train) # Predict on test data y_pred = clf.predict(X_test) # Compute accuracy score accuracy = accuracy_score(y_test, y_pred) print('Accuracy:', accuracy)
In this example, we use the SVM model for classification, targeting a very common data set, the Iris data set. The advantage of SVM is its powerful classification ability and its suitability for high-dimensional data and nonlinear problems. Implementing SVM requires tuning a series of hyperparameters to achieve the best classification effect.
The above is the detailed content of SVM examples in Python. For more information, please follow other related articles on the PHP Chinese website!