Detailed explanation of the K-means algorithm in PHP
The K-means algorithm is a commonly used clustering analysis algorithm and is widely used in the fields of data mining and machine learning. This article will introduce in detail the process of implementing the K-means algorithm using PHP and provide code examples.
The K-means algorithm divides the sample points in the data set into multiple clusters to make the distance between sample points in the cluster as small as possible, and The distance between sample points between clusters should be as large as possible. The specific implementation process is as follows:
1.1 Initialization
First, the number of clusters K needs to be determined. Then K sample points are randomly selected from the data set as the initial center points.
1.2 Assignment
For each sample point in the data set, calculate the distance between it and all center points and assign it to the closest cluster.
1.3 Update
For each cluster, calculate the mean of the sample points in the cluster as the new center point.
1.4 Repeat iteration
Repeat the allocation and update process until the sample points in the cluster no longer change, or the predetermined number of iterations is reached.
The following is a code example using PHP to implement the K-means algorithm:
<?php function kMeans($data, $k, $iterations) { // 初始化簇中心点 $centers = []; for ($i = 0; $i < $k; $i++) { $centers[] = $data[array_rand($data)]; } // 迭代分配和更新过程 for ($iteration = 0; $iteration < $iterations; $iteration++) { $clusters = array_fill(0, count($centers), []); foreach ($data as $point) { // 计算样本点与各个中心点的距离 $distances = []; foreach ($centers as $center) { $distance = calculateDistance($point, $center); $distances[] = $distance; } // 将样本点分配到最近的簇 $clusterIndex = array_search(min($distances), $distances); $clusters[$clusterIndex][] = $point; } // 更新中心点 $newCenters = []; foreach ($clusters as $cluster) { $newCenter = calculateMean($cluster); $newCenters[] = $newCenter; } // 判断是否达到终止条件 if ($centers == $newCenters) { break; } $centers = $newCenters; } return $clusters; } // 计算两个样本点之间的欧氏距离 function calculateDistance($point1, $point2) { $distance = 0; for ($i = 0; $i < count($point1); $i++) { $distance += pow($point1[$i] - $point2[$i], 2); } return sqrt($distance); } // 计算簇内样本点的均值 function calculateMean($cluster) { $mean = []; $dimension = count($cluster[0]); for ($i = 0; $i < $dimension; $i++) { $sum = 0; foreach ($cluster as $point) { $sum += $point[$i]; } $mean[] = $sum / count($cluster); } return $mean; } // 测试代码 $data = [ [2, 10], [2, 5], [8, 4], [5, 8], [7, 5], [6, 4], [1, 2], [4, 9], ]; $k = 2; $iterations = 100; $clusters = kMeans($data, $k, $iterations); print_r($clusters); ?>
In the above code, we first define A kMeans function used to perform the K-means algorithm. Then the calculateDistance function is implemented to calculate the Euclidean distance between two sample points. Finally, the calculateMean function is implemented, which is used to calculate the mean of sample points within the cluster.
According to the above code, we perform cluster analysis on a simple two-dimensional data set and print out the results. The output will show the cluster allocation.
Array ( [0] => Array ( [0] => Array ( [0] => 2 [1] => 10 ) [1] => Array ( [0] => 2 [1] => 5 ) [2] => Array ( [0] => 1 [1] => 2 ) ) [1] => Array ( [0] => Array ( [0] => 8 [1] => 4 ) [1] => Array ( [0] => 5 [1] => 8 ) [2] => Array ( [0] => 7 [1] => 5 ) [3] => Array ( [0] => 6 [1] => 4 ) [4] => Array ( [0] => 4 [1] => 9 ) ) )
The above results show that the K-means algorithm divides the sample points into two clusters. The first cluster contains three sample points [2, 10], [2, 5] and [1, 2] , the second cluster contains the other five sample points.
Through the above code and sample data, we can see that the process of using PHP to implement the K-means algorithm is very simple, and at the same time, effective clustering results can be obtained.
Summary
K-means algorithm is a commonly used cluster analysis algorithm. By dividing the sample points in the data set into multiple clusters, it can minimize the distance within the cluster and the distance between clusters. maximizing goal. This article provides detailed procedures and code examples for implementing the K-means algorithm using PHP and demonstrates it with a simple two-dimensional data set. Readers can adjust relevant parameters according to actual needs to apply them to their own data analysis tasks.
The above is the detailed content of Detailed explanation of K-means algorithm in PHP. For more information, please follow other related articles on the PHP Chinese website!