How to write PCA principal component analysis algorithm in Python?
PCA (Principal Component Analysis) is a commonly used unsupervised learning algorithm used to reduce the dimensionality of data to better understand and analyze data. In this article, we will learn how to write the PCA principal component analysis algorithm using Python and provide specific code examples.
The steps of PCA are as follows:
Code example:
import numpy as np def pca(X, k): # 1. 标准化数据 X_normalized = (X - np.mean(X, axis=0)) / np.std(X, axis=0) # 2. 计算协方差矩阵 covariance_matrix = np.cov(X_normalized.T) # 3. 计算特征值和特征向量 eigenvalues, eigenvectors = np.linalg.eig(covariance_matrix) # 4. 选择主成分 eig_indices = np.argsort(eigenvalues)[::-1] # 根据特征值的大小对特征向量进行排序 top_k_eig_indices = eig_indices[:k] # 选择前k个特征值对应的特征向量 top_k_eigenvectors = eigenvectors[:, top_k_eig_indices] # 5. 转换数据 transformed_data = np.dot(X_normalized, top_k_eigenvectors) return transformed_data # 示例数据 X = np.array([[1, 2], [3, 4], [5, 6], [7, 8]]) # 使用PCA降低维度到1 k = 1 transformed_data = pca(X, k) print(transformed_data)
In the above code, we first normalize the data by np.mean
and np.std
. Then, use np.cov
to calculate the covariance matrix. Next, use np.linalg.eig
to perform eigenvalue decomposition on the covariance matrix to obtain eigenvalues and eigenvectors. We sort according to the size of the eigenvalues and select the eigenvectors corresponding to the first k eigenvalues. Finally, we multiply the normalized data with the selected feature vector to get the transformed data.
In the sample data, we use a simple 2-dimensional data as an example. Finally, we reduce the dimensionality to 1 dimension and print out the converted data.
Run the above code, the output result is as follows:
[[-1.41421356] [-0.70710678] [ 0.70710678] [ 1.41421356]]
This result shows that the data has been successfully converted to 1-dimensional space.
Through this example, you can learn how to use Python to write the PCA principal component analysis algorithm and use np.mean
, np.std
, np .cov
and np.linalg.eig
and other NumPy functions are used for calculation. I hope this article can help you better understand the principles and implementation of the PCA algorithm, and be able to apply it in your data analysis and machine learning tasks.
The above is the detailed content of How to write PCA principal component analysis algorithm in Python?. For more information, please follow other related articles on the PHP Chinese website!