Principal component analysis example in Python
Principal Component Analysis Example in Python
Principal Component Analysis (PCA) is a method commonly used for data dimensionality reduction. It can reduce the dimensionality of high-dimensional data to low dimensions, retaining all the data. Possibly more data variation information. Python provides many libraries and tools for implementing PCA. This article uses an example to introduce how to use the sklearn library in Python to implement PCA.
First, we need to prepare a data set. This article will use the Iris data set, which contains 150 sample data. Each sample has 4 feature values (the length and width of the calyx, the length and width of the petals), and a label (the type of iris flower). Our goal is to reduce the dimensionality of these four features and find the most important principal components.
First, we need to import the necessary libraries and data sets.
from sklearn.datasets import load_iris from sklearn.decomposition import PCA import matplotlib.pyplot as plt iris = load_iris() X = iris.data y = iris.target
Now we can create a PCA object and apply it.
pca = PCA(n_components=2) X_pca = pca.fit_transform(X)
The PCA object here sets n_components=2, which means that we only want to display our processed data on a two-dimensional plane. We apply fit_transform to the original data X and obtain the processed data set X_pca.
Now we can plot the results.
plt.scatter(X_pca[:, 0], X_pca[:, 1], c=y) plt.xlabel('Component 1') plt.ylabel('Component 2') plt.show()
In this figure, we can see the distribution of the Iris data set in the two-dimensional space after dimensionality reduction. Each dot represents a sample of an iris flower, and the color indicates the type of iris flower.
Now let’s see what the principal components should be.
print(pca.components_)
This will output two vectors called "Component 1" and "Component 2".
[[ 0.36158968 -0.08226889 0.85657211 0.35884393]
[-0.65653988 -0.72971237 0.1757674 0.07470647]]
Each element represents the weight of a feature in the original data. In other words, we can think of principal components as vectors used to linearly combine the original features. Each vector in the result is a unit vector.
We can also look at the amount of variance in the data explained by each component.
print(pca.explained_variance_ratio_)
This output will show the proportion of the variance in the data explained by each component.
[0.92461621 0.05301557]
We can see that these two components explain a total of 94% of the variance in the data. This means we can capture the characteristics of the data very accurately.
One thing to note is that PCA will remove all features from the original data. Therefore, if we need to retain certain features, we need to remove them manually before applying PCA.
This is an example of how to implement PCA using the sklearn library in Python. PCA can be applied to all types of data and helps us discover the most important components from high-dimensional data. If you can understand the code in this article, you will also be able to apply PCA on your own data sets.
The above is the detailed content of Principal component analysis example in Python. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics



Pandas is a powerful data analysis tool that can easily read and process various types of data files. Among them, CSV files are one of the most common and commonly used data file formats. This article will introduce how to use Pandas to read CSV files and perform data analysis, and provide specific code examples. 1. Import the necessary libraries First, we need to import the Pandas library and other related libraries that may be needed, as shown below: importpandasaspd 2. Read the CSV file using Pan

Common data analysis methods: 1. Comparative analysis method; 2. Structural analysis method; 3. Cross analysis method; 4. Trend analysis method; 5. Cause and effect analysis method; 6. Association analysis method; 7. Cluster analysis method; 8 , Principal component analysis method; 9. Scatter analysis method; 10. Matrix analysis method. Detailed introduction: 1. Comparative analysis method: Comparative analysis of two or more data to find the differences and patterns; 2. Structural analysis method: A method of comparative analysis between each part of the whole and the whole. ; 3. Cross analysis method, etc.

Usage and code examples of the sqrt() function in Python 1. Function and introduction of the sqrt() function In Python programming, the sqrt() function is a function in the math module, and its function is to calculate the square root of a number. The square root means that a number multiplied by itself equals the square of the number, that is, x*x=n, then x is the square root of n. The sqrt() function can be used in the program to calculate the square root. 2. How to use the sqrt() function in Python, sq

Following the last inventory of "11 Basic Charts Data Scientists Use 95% of the Time", today we will bring you 11 basic distributions that data scientists use 95% of the time. Mastering these distributions helps us understand the nature of the data more deeply and make more accurate inferences and predictions during data analysis and decision-making. 1. Normal Distribution Normal Distribution, also known as Gaussian Distribution, is a continuous probability distribution. It has a symmetrical bell-shaped curve with the mean (μ) as the center and the standard deviation (σ) as the width. The normal distribution has important application value in many fields such as statistics, probability theory, and engineering.

How to do image processing and recognition in Python Summary: Modern technology has made image processing and recognition an important tool in many fields. Python is an easy-to-learn and use programming language with rich image processing and recognition libraries. This article will introduce how to use Python for image processing and recognition, and provide specific code examples. Image processing: Image processing is the process of performing various operations and transformations on images to improve image quality, extract information from images, etc. PIL library in Python (Pi

Visualization is a powerful tool for communicating complex data patterns and relationships in an intuitive and understandable way. They play a vital role in data analysis, providing insights that are often difficult to discern from raw data or traditional numerical representations. Visualization is crucial for understanding complex data patterns and relationships, and we will introduce the 11 most important and must-know charts that help reveal the information in the data and make complex data more understandable and meaningful. 1. KSPlotKSPlot is used to evaluate distribution differences. The core idea is to measure the maximum distance between the cumulative distribution functions (CDF) of two distributions. The smaller the maximum distance, the more likely they belong to the same distribution. Therefore, it is mainly interpreted as a "system" for determining distribution differences.

In today's intelligent society, machine learning and data analysis are indispensable tools that can help people better understand and utilize large amounts of data. In these fields, Go language has also become a programming language that has attracted much attention. Its speed and efficiency make it the choice of many programmers. This article introduces how to use Go language for machine learning and data analysis. 1. The ecosystem of machine learning Go language is not as rich as Python and R. However, as more and more people start to use it, some machine learning libraries and frameworks

How to use ECharts and PHP interfaces to implement data analysis and prediction of statistical charts. Data analysis and prediction play an important role in various fields. They can help us understand the trends and patterns of data and provide references for future decisions. ECharts is an open source data visualization library that provides rich and flexible chart components that can dynamically load and process data by using the PHP interface. This article will introduce the implementation method of statistical chart data analysis and prediction based on ECharts and php interface, and provide
