Python is a simple and easy-to-learn programming language with a rich set of scientific computing libraries and data processing tools. Among them, the Naive Bayes algorithm, as a classic machine learning method, is also widely used in the Python language. This article will use examples to introduce the usage and steps of Naive Bayes in Python.
The Naive Bayes algorithm is a classification algorithm based on Bayes’ theorem. Its core idea is to use known training data The characteristics of the set are used to infer the classification results of new data. In practical applications, the Naive Bayes algorithm is often used in scenarios such as text classification, spam filtering, and sentiment analysis.
The characteristic of the Naive Bayes algorithm is that it assumes that each feature is independent of each other. This assumption is often not true in actual situations, so the Naive Bayes algorithm is called "naive". Despite this assumption, Naive Bayes still performs well on problems such as short text classification.
In Python, the steps for using Naive Bayes Classifier can be summarized as follows:
2.1 Prepare data
First you need to prepare the training data and test data to be classified. This data can be in the form of text, pictures, audio, etc., but it needs to be converted into a form that can be understood by the computer. In text classification problems, it is often necessary to convert text into vector representation.
2.2 Training model
Next, you need to use the training data set to build the Naive Bayes classifier. There are three commonly used naive Bayes classifiers in Python:
Taking text classification as an example, you can use the TfidfVectorizer class provided by the sklearn library to convert the text into a vector representation, and use the MultinomialNB classifier for training.
2.3 Test model
After the training is completed, the test data set needs to be used to evaluate the performance of the model. Typically, the test data set and the training data set are independent. It should be noted that data from the training dataset cannot be used during testing. You can use the accuracy_score function provided by the sklearn library to calculate the accuracy of the model.
In order to demonstrate the practical application of the Naive Bayes classifier, this article uses text classification based on Naive Bayes For example.
3.1 Prepare data
First, find two text data sets from the Internet, namely "Sports News" and "Science and Technology News". Each data set contains 1,000 texts. Put the two data sets into different folders and label the texts as "Sports" and "Technology" respectively.
3.2 Use the sklearn library for classification
Next, use the naive Bayes classifier provided by the sklearn library for classification.
(1) Import related libraries
from sklearn.model_selection import train_test_split from sklearn.feature_extraction.text import TfidfVectorizer from sklearn.naive_bayes import MultinomialNB from sklearn.metrics import accuracy_score import os
(2) Read text data and its annotations
def read_files(path): text_list = [] label_list = [] for root, dirs, files in os.walk(path): for file in files: file_path = os.path.join(root, file) with open(file_path, 'r', encoding='utf-8') as f: text = ''.join(f.readlines()) text_list.append(text) if '体育' in file_path: label_list.append('体育') elif '科技' in file_path: label_list.append('科技') return text_list, label_list
(3) Convert text into vector representation
def text_vectorizer(text_list): vectorizer = TfidfVectorizer() X = vectorizer.fit_transform(text_list) return X, vectorizer
(4) Train the model and return the accuracy
def train(text_list, label_list): X, vectorizer = text_vectorizer(text_list) y = label_list X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) clf = MultinomialNB() clf.fit(X_train, y_train) y_pred = clf.predict(X_test) acc = accuracy_score(y_test, y_pred) return clf, vectorizer, acc
(5) Test the model
def predict(clf, vectorizer, text): X = vectorizer.transform(text) y_pred = clf.predict(X) return y_pred[0]
3.3 Result analysis
Run the above code to get the accuracy of the classifier is 0.955. When performing actual classification, you only need to input the text to be classified into the predict function to return the category it belongs to. For example, enter the text "iPhone 12 is finally released!" to return to the "Technology" category.
As a simple and effective classification algorithm, the Naive Bayes algorithm is also widely used in Python. This article introduces the methods and steps of using the Naive Bayes classifier, and takes text classification based on Naive Bayes as an example to demonstrate the practical application of the classifier. In the actual application process, data preprocessing, feature selection and other operations are also required to improve the accuracy of the classifier.
The above is the detailed content of Naive Bayes examples in Python. For more information, please follow other related articles on the PHP Chinese website!