Cluster analysis is a common data analysis method that can divide data sets into different groups or categories. Python provides a variety of clustering algorithms, and we can choose different algorithms for analysis according to different needs. This article will introduce some commonly used clustering algorithms in Python and give example applications.
1. K-Means algorithm
The K-Means algorithm is a commonly used clustering algorithm that groups data based on Euclidean distance. This algorithm divides the data set into k clusters, where the center point of each cluster is the mean of all members of the cluster. The specific steps of the algorithm are as follows:
The following is a Python example using the K-Means algorithm for cluster analysis:
import numpy as np from sklearn.cluster import KMeans from sklearn.datasets import make_blobs import matplotlib.pyplot as plt # 生成随机数据 X, y = make_blobs(n_samples=300, centers=4, random_state=42) # 运行 K-Means 算法 kmeans = KMeans(n_clusters=4, random_state=42) y_pred = kmeans.fit_predict(X) # 绘制聚类结果 plt.scatter(X[:, 0], X[:, 1], c=y_pred) plt.title("K-Means Clustering") plt.show()
In the above code, the make_blobs function is used to generate a data set containing 300 sample points. , including a total of 4 clusters. Then use the KMeans function to perform clustering, specify the number of clusters as 4, and obtain the classification results of each data point through the fit_predict method. Finally, use Matplotlib to plot the clustering results.
2. Hierarchical clustering algorithm
The hierarchical clustering algorithm is a bottom-up clustering algorithm that gradually merges data into larger clusters based on the similarity of the data. The specific steps of the algorithm are as follows:
The following is a Python example of cluster analysis using hierarchical clustering algorithm:
from sklearn.cluster import AgglomerativeClustering from sklearn.datasets import make_moons import matplotlib.pyplot as plt # 生成随机数据 X, y = make_moons(n_samples=200, noise=0.05, random_state=42) # 运行层次聚类算法 agglomerative = AgglomerativeClustering(n_clusters=2) y_pred = agglomerative.fit_predict(X) # 绘制聚类结果 plt.scatter(X[:, 0], X[:, 1], c=y_pred) plt.title("Agglomerative Clustering") plt.show()
In the above code, the make_moons function is used to generate a data set containing 200 sample points. , and use the AgglomerativeClustering function for clustering, specifying the number of clusters as 2. Finally, use Matplotlib to plot the clustering results.
3. DBSCAN algorithm
The DBSCAN algorithm is a density-based clustering algorithm that can divide data points into different clusters based on the density of the data set. The specific steps of the algorithm are as follows:
The following is a Python example using the DBSCAN algorithm for cluster analysis:
from sklearn.cluster import DBSCAN from sklearn.datasets import make_moons import matplotlib.pyplot as plt # 生成随机数据 X, y = make_moons(n_samples=200, noise=0.05, random_state=42) # 运行 DBSCAN 算法 dbscan = DBSCAN(eps=0.2, min_samples=5) y_pred = dbscan.fit_predict(X) # 绘制聚类结果 plt.scatter(X[:, 0], X[:, 1], c=y_pred) plt.title("DBSCAN Clustering") plt.show()
In the above code, the make_moons function is used to generate a data set containing 200 sample points, and Clustering was performed using the DBSCAN function, specifying thresholds for radius and minimum number of samples. Finally, use Matplotlib to plot the clustering results.
Summary
This article introduces three commonly used clustering algorithms in Python and gives corresponding example applications. Clustering algorithms are a very useful data analysis method that can help us discover hidden patterns and relationships in data. In practical applications, we can choose different algorithms for analysis based on the characteristics and needs of the data.
The above is the detailed content of Cluster analysis examples in Python. For more information, please follow other related articles on the PHP Chinese website!