How to implement the Naive Bayes algorithm for machine learning in PHP-PHP Tutorial-php.cn

Home

Backend Development

PHP Tutorial

How to implement the Naive Bayes algorithm for machine learning in PHP

小云云

Dec 12, 2017 am 09:26 AM

php

This article mainly introduces the Naive Bayes algorithm for machine learning in PHP. It analyzes the concepts, principles and PHP implementation techniques of the Naive Bayes algorithm in detail in the form of examples. Friends who need it can refer to it. I hope it can help everyone. .

The example in this article describes the implementation of the Naive Bayes algorithm for machine learning in PHP. Share it with everyone for your reference, the details are as follows:

Machine learning has become ubiquitous in our lives. Everything from the thermostat working when you're at home to smart cars and the smartphones in our pockets. Machine learning seems to be everywhere and is an area well worth exploring. But what is machine learning? Generally speaking, machine learning is about allowing the system to continuously learn and predict new problems. From simple predictions of shopping items to complex digital assistant predictions.

In this article I will introduce the Naive Bayes algorithm Clasifier as a class. This is a simple algorithm that is easy to implement and gives satisfactory results. But this algorithm requires a little statistical knowledge to understand. In the last part of the article you can see some example code and even try to do your own machine learning.

Getting Started

So, what function is this Classifier used to achieve? In fact, it is mainly used to determine whether a given statement is positive or negative. For example, "Symfony is the best" is a positive statement, and "No Symfony is bad" is a negative statement. So after giving a statement, I want this Classifier to return a statement type without giving a new rule.

I named Classifier a class with the same name and contains a guess method. This method accepts a statement as input and returns whether the statement is positive or negative. The class looks like this:

class Classifier
{
 public function guess($statement)
 {}
}

Copy after login

I prefer to use enum typed classes instead of strings for my return values. I named the class of this enumeration type Type, and it contains two constants: one POSITIVE and one NEGATIVE. These two constants will be used as the return value of the guess method.

class Type
{
 const POSITIVE = &#39;positive&#39;;
 const NEGATIVE = &#39;negative&#39;;
}

Copy after login

The initialization work has been completed, and the next step is to write our algorithm for prediction.

Naive Bayes

The Naive Bayes algorithm works based on a training set, and makes corresponding predictions based on this training set. This algorithm uses simple statistics and a bit of mathematics to calculate the results. For example, the training set consists of the following four texts:

语句	类型
Symfony is the best	Positive
PhpStorm is great	Positive
Iltar complains a lot	Negative
No Symfony is bad	Negative

如果给定语句是“Symfony is the best”，那么你可以说这个语句是积极地。你平常也会根据之前学习到的相应知识做出对应的决定，朴素贝叶斯算法也是同样的道理：它根据之前的训练集来决定哪一个类型更加相近。

学习

在这个算法正式工作之前，它需要大量的历史信息作为训练集。它需要知道两件事：每一个类型对应的词产生了多少次和每一个语句对应的类型是什么。我们在实施的时候会将这两种信息存储在两个数组当中。一个数组包含每一类型的词语统计，另一个数组包含每一个类型的语句统计。所有的其他信息都可以从这两个数组中聚合。代码就像下面的一样：

function learn($statement, $type)
{
 $words = $this->getWords($statement);
 foreach ($words as $word) {
 if (!isset($this->words[$type][$word])) {
  $this->words[$type][$word] = 0;
 }
 $this->words[$type][$word]++; // 增加类型的词语统计
 }
 $this->documents[$type]++; // 增加类型的语句统计
}

Copy after login

有了这个集合以后，现在这个算法就可以根据历史数据接受预测训练了。

定义

为了解释这个算法是如何工作的，几个定义是必要的。首先，让我们定义一下输入的语句是给定类型中的一个的概率。这个将会表示为P（Type）。它是以已知类型的数据的类型作为分子，还有整个训练集的数据数量作为分母来得出的。一个数据就是整个训练集中的一个。到现在为止，这个方法可以将会命名为totalP，像下面这样：

function totalP($type)
{
 return ($this->documents[$type] + 1) / (array_sum($this->documents) + 1);
}

Copy after login

请注意，在这里分子和分母都加了1。这是为了避免分子和分母都为0的情况。

根据上面的训练集的例子，积极和消极的类型都会得出0.6的概率。每中类型的数据都是2个，一共是4个数据所以就是（2+1）/（4+1）。

第二个要定义的是对于给定的一个词是属于哪个确定类型的概率。这个我们定义成P(word,Type)。首先我们要得到一个词在训练集中给出确定类型出现的次数，然后用这个结果来除以整个给定类型数据的词数。这个方法我们定义为p：

function p($word, $type)
{
 $count = isset($this->words[$type][$word]) ? $this->words[$type][$word] : 0;
 return ($count + 1) / (array_sum($this->words[$type]) + 1);
}

Copy after login

在本次的训练集中，“is”的是积极类型的概率为0.375。这个词在整个积极的数据中的7个词中占了两次，所以结果就是（2+1）/（7+1）。

最后，这个算法应该只关心关键词而忽略其他的因素。一个简单的方法就是将给定的字符串中的单词分离出来：

function getWords($string)
{
 return preg_split(&#39;/\s+/&#39;, preg_replace(&#39;/[^A-Za-z0-9\s]/&#39;, &#39;&#39;, strtolower($string)));
}

Copy after login

准备工作都做好了，开始真正实施我们的计划吧！

预测

为了预测语句的类型，这个算法应该计算所给定语句的两个类型的概率。像上面一样，我们定义一个P（Type,sentence）。得出概率高的类型将会是Classifier类中算法返回的结果。

为了计算P（Type,sentence）,算法当中将用到贝叶斯定理。算法像这样被定义：P（Type,sentence）= P（Type）* P（sentence,Type）/ P（sentence）。这意味着给定语句的类型概率和给定类型语句概率除以语句的概率的结果是相同的。

那么算法在计算每一个相同语句的P（Tyoe,sentence），P（sentence）是保持一样的。这意味着算法就可以省略其他因素，我们只需要关心最高的概率而不是实际的值。计算就像这样：P（Type,sentence） = P（Type）* P（sentence,Type）。

最后，为了计算P（sentence,Type），我们可以为语句中的每个词添加一条链式规则。所以在一条语句中如果有n个词的话，它将会和P（word_1,Type）* P（word_2,Type）* P（word_3,Type）* .....*P（word_n,Type）是一样的。每一个词计算结果的概率使用了我们前面看到的定义。

好了，所有的都说完了，是时候在php中实际操作一下了：

function guess($statement)
{
 $words = $this->getWords($statement); // 得到单词
 $best_likelihood = 0;
 $best_type = null;
 foreach ($this->types as $type) {
 $likelihood = $this->pTotal($type); //计算 P(Type)
 foreach ($words as $word) {
  $likelihood *= $this->p($word, $type); // 计算 P(word, Type)
 }
 if ($likelihood > $best_likelihood) {
  $best_likelihood = $likelihood;
  $best_type = $type;
 }
 }
 return $best_type;
}

Copy after login

这就是所有的工作，现在算法可以预测语句的类型了。你要做的就是让你的算法开始学习：

$classifier = new Classifier();
$classifier->learn(&#39;Symfony is the best&#39;, Type::POSITIVE);
$classifier->learn(&#39;PhpStorm is great&#39;, Type::POSITIVE);
$classifier->learn(&#39;Iltar complains a lot&#39;, Type::NEGATIVE);
$classifier->learn(&#39;No Symfony is bad&#39;, Type::NEGATIVE);
var_dump($classifier->guess(&#39;Symfony is great&#39;)); // string(8) "positive"
var_dump($classifier->guess(&#39;I complain a lot&#39;)); // string(8) "negative"

Copy after login

所有的代码我已经上传到了GIT上，https://github.com/yannickl88/blog-articles/blob/master/src/machine-learning-naive-bayes/Classifier.php

github上完整php代码如下：

 [], Type::NEGATIVE => []];
 private $documents = [Type::POSITIVE => 0, Type::NEGATIVE => 0];
 public function guess($statement)
 {
 $words  = $this->getWords($statement); // get the words
 $best_likelihood = 0;
 $best_type = null;
 foreach ($this->types as $type) {
  $likelihood = $this->pTotal($type); // calculate P(Type)
  foreach ($words as $word) {
  $likelihood *= $this->p($word, $type); // calculate P(word, Type)
  }
  if ($likelihood > $best_likelihood) {
  $best_likelihood = $likelihood;
  $best_type = $type;
  }
 }
 return $best_type;
 }
 public function learn($statement, $type)
 {
 $words = $this->getWords($statement);
 foreach ($words as $word) {
  if (!isset($this->words[$type][$word])) {
  $this->words[$type][$word] = 0;
  }
  $this->words[$type][$word]++; // increment the word count for the type
 }
 $this->documents[$type]++; // increment the document count for the type
 }
 public function p($word, $type)
 {
 $count = 0;
 if (isset($this->words[$type][$word])) {
  $count = $this->words[$type][$word];
 }
 return ($count + 1) / (array_sum($this->words[$type]) + 1);
 }
 public function pTotal($type)
 {
 return ($this->documents[$type] + 1) / (array_sum($this->documents) + 1);
 }
 public function getWords($string)
 {
 return preg_split('/\s+/', preg_replace('/[^A-Za-z0-9\s]/', '', strtolower($string)));
 }
}
$classifier = new Classifier();
$classifier->learn(&#39;Symfony is the best&#39;, Type::POSITIVE);
$classifier->learn(&#39;PhpStorm is great&#39;, Type::POSITIVE);
$classifier->learn(&#39;Iltar complains a lot&#39;, Type::NEGATIVE);
$classifier->learn(&#39;No Symfony is bad&#39;, Type::NEGATIVE);
var_dump($classifier->guess(&#39;Symfony is great&#39;)); // string(8) "positive"
var_dump($classifier->guess(&#39;I complain a lot&#39;)); // string(8) "negative"

Copy after login

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

How to fix KB5055523 fails to install in Windows 11?

3 weeks ago By DDD

How to fix KB5055518 fails to install in Windows 10?

3 weeks ago By DDD

Roblox: Dead Rails - How To Tame Wolves

4 weeks ago By DDD

Roblox: Grow A Garden - Complete Mutation Guide

2 weeks ago By DDD

Strength Levels for Every Enemy & Monster in R.E.P.O.

4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Java Tutorial

1657

CakePHP Tutorial

1415

Laravel Tutorial

1309

PHP Tutorial

1257

C# Tutorial

1230

Related knowledge

PHP 8.4 Installation and Upgrade guide for Ubuntu and Debian Dec 24, 2024 pm 04:42 PM

PHP 8.4 brings several new features, security improvements, and performance improvements with healthy amounts of feature deprecations and removals. This guide explains how to install PHP 8.4 or upgrade to PHP 8.4 on Ubuntu, Debian, or their derivati

How do you parse and process HTML/XML in PHP? Feb 07, 2025 am 11:57 AM

This tutorial demonstrates how to efficiently process XML documents using PHP. XML (eXtensible Markup Language) is a versatile text-based markup language designed for both human readability and machine parsing. It's commonly used for data storage an

Explain JSON Web Tokens (JWT) and their use case in PHP APIs. Apr 05, 2025 am 12:04 AM

JWT is an open standard based on JSON, used to securely transmit information between parties, mainly for identity authentication and information exchange. 1. JWT consists of three parts: Header, Payload and Signature. 2. The working principle of JWT includes three steps: generating JWT, verifying JWT and parsing Payload. 3. When using JWT for authentication in PHP, JWT can be generated and verified, and user role and permission information can be included in advanced usage. 4. Common errors include signature verification failure, token expiration, and payload oversized. Debugging skills include using debugging tools and logging. 5. Performance optimization and best practices include using appropriate signature algorithms, setting validity periods reasonably,

Explain late static binding in PHP (static::). Apr 03, 2025 am 12:04 AM

Static binding (static::) implements late static binding (LSB) in PHP, allowing calling classes to be referenced in static contexts rather than defining classes. 1) The parsing process is performed at runtime, 2) Look up the call class in the inheritance relationship, 3) It may bring performance overhead.

PHP Program to Count Vowels in a String Feb 07, 2025 pm 12:12 PM

A string is a sequence of characters, including letters, numbers, and symbols. This tutorial will learn how to calculate the number of vowels in a given string in PHP using different methods. The vowels in English are a, e, i, o, u, and they can be uppercase or lowercase. What is a vowel? Vowels are alphabetic characters that represent a specific pronunciation. There are five vowels in English, including uppercase and lowercase: a, e, i, o, u Example 1 Input: String = "Tutorialspoint" Output: 6 explain The vowels in the string "Tutorialspoint" are u, o, i, a, o, i. There are 6 yuan in total

What are PHP magic methods (__construct, __destruct, __call, __get, __set, etc.) and provide use cases? Apr 03, 2025 am 12:03 AM

What are the magic methods of PHP? PHP's magic methods include: 1.\_\_construct, used to initialize objects; 2.\_\_destruct, used to clean up resources; 3.\_\_call, handle non-existent method calls; 4.\_\_get, implement dynamic attribute access; 5.\_\_set, implement dynamic attribute settings. These methods are automatically called in certain situations, improving code flexibility and efficiency.

PHP and Python: Comparing Two Popular Programming Languages Apr 14, 2025 am 12:13 AM

PHP and Python each have their own advantages, and choose according to project requirements. 1.PHP is suitable for web development, especially for rapid development and maintenance of websites. 2. Python is suitable for data science, machine learning and artificial intelligence, with concise syntax and suitable for beginners.

PHP: A Key Language for Web Development Apr 13, 2025 am 12:08 AM

PHP is a scripting language widely used on the server side, especially suitable for web development. 1.PHP can embed HTML, process HTTP requests and responses, and supports a variety of databases. 2.PHP is used to generate dynamic web content, process form data, access databases, etc., with strong community support and open source resources. 3. PHP is an interpreted language, and the execution process includes lexical analysis, grammatical analysis, compilation and execution. 4.PHP can be combined with MySQL for advanced applications such as user registration systems. 5. When debugging PHP, you can use functions such as error_reporting() and var_dump(). 6. Optimize PHP code to use caching mechanisms, optimize database queries and use built-in functions. 7

See all articles