Naive Bayes examples in Python
Python is a simple and easy-to-learn programming language with a rich set of scientific computing libraries and data processing tools. Among them, the Naive Bayes algorithm, as a classic machine learning method, is also widely used in the Python language. This article will use examples to introduce the usage and steps of Naive Bayes in Python.
- Introduction to Naive Bayes
The Naive Bayes algorithm is a classification algorithm based on Bayes’ theorem. Its core idea is to use known training data The characteristics of the set are used to infer the classification results of new data. In practical applications, the Naive Bayes algorithm is often used in scenarios such as text classification, spam filtering, and sentiment analysis.
The characteristic of the Naive Bayes algorithm is that it assumes that each feature is independent of each other. This assumption is often not true in actual situations, so the Naive Bayes algorithm is called "naive". Despite this assumption, Naive Bayes still performs well on problems such as short text classification.
- Using Naive Bayes Classifier
In Python, the steps for using Naive Bayes Classifier can be summarized as follows:
2.1 Prepare data
First you need to prepare the training data and test data to be classified. This data can be in the form of text, pictures, audio, etc., but it needs to be converted into a form that can be understood by the computer. In text classification problems, it is often necessary to convert text into vector representation.
2.2 Training model
Next, you need to use the training data set to build the Naive Bayes classifier. There are three commonly used naive Bayes classifiers in Python:
- GaussianNB: suitable for classification of continuous data.
- BernoulliNB: Suitable for classification of binary data.
- MultinomialNB: Suitable for classification of multivariate data.
Taking text classification as an example, you can use the TfidfVectorizer class provided by the sklearn library to convert the text into a vector representation, and use the MultinomialNB classifier for training.
2.3 Test model
After the training is completed, the test data set needs to be used to evaluate the performance of the model. Typically, the test data set and the training data set are independent. It should be noted that data from the training dataset cannot be used during testing. You can use the accuracy_score function provided by the sklearn library to calculate the accuracy of the model.
- Example: Text classification based on Naive Bayes
In order to demonstrate the practical application of the Naive Bayes classifier, this article uses text classification based on Naive Bayes For example.
3.1 Prepare data
First, find two text data sets from the Internet, namely "Sports News" and "Science and Technology News". Each data set contains 1,000 texts. Put the two data sets into different folders and label the texts as "Sports" and "Technology" respectively.
3.2 Use the sklearn library for classification
Next, use the naive Bayes classifier provided by the sklearn library for classification.
(1) Import related libraries
from sklearn.model_selection import train_test_split from sklearn.feature_extraction.text import TfidfVectorizer from sklearn.naive_bayes import MultinomialNB from sklearn.metrics import accuracy_score import os
(2) Read text data and its annotations
def read_files(path): text_list = [] label_list = [] for root, dirs, files in os.walk(path): for file in files: file_path = os.path.join(root, file) with open(file_path, 'r', encoding='utf-8') as f: text = ''.join(f.readlines()) text_list.append(text) if '体育' in file_path: label_list.append('体育') elif '科技' in file_path: label_list.append('科技') return text_list, label_list
(3) Convert text into vector representation
def text_vectorizer(text_list): vectorizer = TfidfVectorizer() X = vectorizer.fit_transform(text_list) return X, vectorizer
(4) Train the model and return the accuracy
def train(text_list, label_list): X, vectorizer = text_vectorizer(text_list) y = label_list X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) clf = MultinomialNB() clf.fit(X_train, y_train) y_pred = clf.predict(X_test) acc = accuracy_score(y_test, y_pred) return clf, vectorizer, acc
(5) Test the model
def predict(clf, vectorizer, text): X = vectorizer.transform(text) y_pred = clf.predict(X) return y_pred[0]
3.3 Result analysis
Run the above code to get the accuracy of the classifier is 0.955. When performing actual classification, you only need to input the text to be classified into the predict function to return the category it belongs to. For example, enter the text "iPhone 12 is finally released!" to return to the "Technology" category.
- Summary
As a simple and effective classification algorithm, the Naive Bayes algorithm is also widely used in Python. This article introduces the methods and steps of using the Naive Bayes classifier, and takes text classification based on Naive Bayes as an example to demonstrate the practical application of the classifier. In the actual application process, data preprocessing, feature selection and other operations are also required to improve the accuracy of the classifier.
The above is the detailed content of Naive Bayes examples in Python. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics



Assertions in Python are a useful tool for programmers to debug their code. It is used to verify that the internal state of the program meets expectations and raise an assertion error (AssertionError) when these conditions are false. During the development process, assertions are used during testing and debugging to check whether the status of the code matches the expected results. This article will discuss the causes, solutions, and how to correctly use assertions in your code. Cause of assertion error Assertion error pass

Stratified Sampling Technique in Python Sampling is a commonly used data collection method in statistics. It can select a portion of samples from the data set for analysis to infer the characteristics of the entire data set. In the era of big data, the amount of data is huge, and using full samples for analysis is both time-consuming and not economically practical. Therefore, choosing an appropriate sampling method can improve the efficiency of data analysis. This article mainly introduces stratified sampling techniques in Python. What is stratified sampling? In sampling, stratified sampling

Overview of how to develop a vulnerability scanner through Python In today's environment of increasing Internet security threats, vulnerability scanners have become an important tool for protecting network security. Python is a popular programming language that is concise, easy to read and powerful, suitable for developing various practical tools. This article will introduce how to use Python to develop a vulnerability scanner to provide real-time protection for your network. Step 1: Determine Scan Targets Before developing a vulnerability scanner, you need to determine what targets you want to scan. This can be your own network or anything you have permission to test

How to use Python to write and execute scripts in Linux In the Linux operating system, we can use Python to write and execute various scripts. Python is a concise and powerful programming language that provides a wealth of libraries and tools to make scripting easier and more efficient. Below we will introduce the basic steps of how to use Python for script writing and execution in Linux, and provide some specific code examples to help you better understand and use it. Install Python

Usage and code examples of the sqrt() function in Python 1. Function and introduction of the sqrt() function In Python programming, the sqrt() function is a function in the math module, and its function is to calculate the square root of a number. The square root means that a number multiplied by itself equals the square of the number, that is, x*x=n, then x is the square root of n. The sqrt() function can be used in the program to calculate the square root. 2. How to use the sqrt() function in Python, sq

Python programming practice: How to use Baidu Map API to generate static map functions Introduction: In modern society, maps have become an indispensable part of people's lives. When working with maps, we often need to obtain a static map of a specific area for display on a web page, mobile app, or report. This article will introduce how to use the Python programming language and Baidu Map API to generate static maps, and provide relevant code examples. 1. Preparation work To realize the function of generating static maps using Baidu Map API, I

Python programming to analyze the coordinate conversion function in Baidu Map API document Introduction: With the rapid development of the Internet, the map positioning function has become an indispensable part of modern people's lives. As one of the most popular map services in China, Baidu Maps provides a series of APIs for developers to use. This article will use Python programming to analyze the coordinate conversion function in Baidu Map API documentation and give corresponding code examples. 1. Introduction In development, we sometimes involve coordinate conversion issues. Baidu Map AP

How to write PCA principal component analysis algorithm in Python? PCA (Principal Component Analysis) is a commonly used unsupervised learning algorithm used to reduce the dimensionality of data to better understand and analyze data. In this article, we will learn how to write the PCA principal component analysis algorithm using Python and provide specific code examples. The steps of PCA are as follows: Standardize the data: Zero the mean of each feature of the data and adjust the variance to the same range to ensure
