Cluster analysis examples in Python-Python Tutorial-php.cn

Cluster analysis examples in Python

王林

Release： 2023-06-10 12:30:07

Original

3231 people have browsed it

Cluster analysis is a common data analysis method that can divide data sets into different groups or categories. Python provides a variety of clustering algorithms, and we can choose different algorithms for analysis according to different needs. This article will introduce some commonly used clustering algorithms in Python and give example applications.

1. K-Means algorithm

The K-Means algorithm is a commonly used clustering algorithm that groups data based on Euclidean distance. This algorithm divides the data set into k clusters, where the center point of each cluster is the mean of all members of the cluster. The specific steps of the algorithm are as follows:

Randomly select k points as the initial cluster centers.
Calculate the distance between all data points and the cluster center, and classify each data point into the closest cluster.
Recalculate the center point of each cluster based on the new classification results.
Repeat steps 2 and 3 until the clusters no longer change or the specified number of iterations is reached.

The following is a Python example using the K-Means algorithm for cluster analysis:

import numpy as np
from sklearn.cluster import KMeans
from sklearn.datasets import make_blobs
import matplotlib.pyplot as plt

# 生成随机数据
X, y = make_blobs(n_samples=300, centers=4, random_state=42)

# 运行 K-Means 算法
kmeans = KMeans(n_clusters=4, random_state=42)
y_pred = kmeans.fit_predict(X)

# 绘制聚类结果
plt.scatter(X[:, 0], X[:, 1], c=y_pred)
plt.title("K-Means Clustering")
plt.show()

Copy after login

In the above code, the make_blobs function is used to generate a data set containing 300 sample points. , including a total of 4 clusters. Then use the KMeans function to perform clustering, specify the number of clusters as 4, and obtain the classification results of each data point through the fit_predict method. Finally, use Matplotlib to plot the clustering results.

2. Hierarchical clustering algorithm

The hierarchical clustering algorithm is a bottom-up clustering algorithm that gradually merges data into larger clusters based on the similarity of the data. The specific steps of the algorithm are as follows:

Treat each data point as a separate cluster.
Calculate the distance between the two closest clusters.
Merge the two closest clusters into a new cluster.
Repeat steps 2 and 3 until all clusters are merged into one cluster or the specified number of clusters is reached.

The following is a Python example of cluster analysis using hierarchical clustering algorithm:

from sklearn.cluster import AgglomerativeClustering
from sklearn.datasets import make_moons
import matplotlib.pyplot as plt

# 生成随机数据
X, y = make_moons(n_samples=200, noise=0.05, random_state=42)

# 运行层次聚类算法
agglomerative = AgglomerativeClustering(n_clusters=2)
y_pred = agglomerative.fit_predict(X)

# 绘制聚类结果
plt.scatter(X[:, 0], X[:, 1], c=y_pred)
plt.title("Agglomerative Clustering")
plt.show()

Copy after login

In the above code, the make_moons function is used to generate a data set containing 200 sample points. , and use the AgglomerativeClustering function for clustering, specifying the number of clusters as 2. Finally, use Matplotlib to plot the clustering results.

3. DBSCAN algorithm

The DBSCAN algorithm is a density-based clustering algorithm that can divide data points into different clusters based on the density of the data set. The specific steps of the algorithm are as follows:

Randomly select an unvisited data point as the core point.
Find all points whose distance from the core point does not exceed a given radius as a density reachable area centered on the core point.
If a point is within the density reachable area of another core point, merge it and the core point into a cluster.
Repeat steps 1 to 3 until no new core points are visited.

The following is a Python example using the DBSCAN algorithm for cluster analysis:

from sklearn.cluster import DBSCAN
from sklearn.datasets import make_moons
import matplotlib.pyplot as plt

# 生成随机数据
X, y = make_moons(n_samples=200, noise=0.05, random_state=42)

# 运行 DBSCAN 算法
dbscan = DBSCAN(eps=0.2, min_samples=5)
y_pred = dbscan.fit_predict(X)

# 绘制聚类结果
plt.scatter(X[:, 0], X[:, 1], c=y_pred)
plt.title("DBSCAN Clustering")
plt.show()

Copy after login

In the above code, the make_moons function is used to generate a data set containing 200 sample points, and Clustering was performed using the DBSCAN function, specifying thresholds for radius and minimum number of samples. Finally, use Matplotlib to plot the clustering results.

Summary

This article introduces three commonly used clustering algorithms in Python and gives corresponding example applications. Clustering algorithms are a very useful data analysis method that can help us discover hidden patterns and relationships in data. In practical applications, we can choose different algorithms for analysis based on the characteristics and needs of the data.

The above is the detailed content of Cluster analysis examples in Python. For more information, please follow other related articles on the PHP Chinese website!