Home Backend Development Python Tutorial Principal component analysis example in Python

Principal component analysis example in Python

Jun 10, 2023 am 08:19 AM
data analysis python programming Principal component analysis (pca)

Principal Component Analysis Example in Python

Principal Component Analysis (PCA) is a method commonly used for data dimensionality reduction. It can reduce the dimensionality of high-dimensional data to low dimensions, retaining all the data. Possibly more data variation information. Python provides many libraries and tools for implementing PCA. This article uses an example to introduce how to use the sklearn library in Python to implement PCA.

First, we need to prepare a data set. This article will use the Iris data set, which contains 150 sample data. Each sample has 4 feature values ​​​​(the length and width of the calyx, the length and width of the petals), and a label (the type of iris flower). Our goal is to reduce the dimensionality of these four features and find the most important principal components.

First, we need to import the necessary libraries and data sets.

from sklearn.datasets import load_iris
from sklearn.decomposition import PCA
import matplotlib.pyplot as plt

iris = load_iris()
X = iris.data
y = iris.target
Copy after login

Now we can create a PCA object and apply it.

pca = PCA(n_components=2)
X_pca = pca.fit_transform(X)
Copy after login

The PCA object here sets n_components=2, which means that we only want to display our processed data on a two-dimensional plane. We apply fit_transform to the original data X and obtain the processed data set X_pca.

Now we can plot the results.

plt.scatter(X_pca[:, 0], X_pca[:, 1], c=y)
plt.xlabel('Component 1')
plt.ylabel('Component 2')
plt.show()
Copy after login

In this figure, we can see the distribution of the Iris data set in the two-dimensional space after dimensionality reduction. Each dot represents a sample of an iris flower, and the color indicates the type of iris flower.

Now let’s see what the principal components should be.

print(pca.components_)
Copy after login

This will output two vectors called "Component 1" and "Component 2".

[[ 0.36158968 -0.08226889 0.85657211 0.35884393]
[-0.65653988 -0.72971237 0.1757674 0.07470647]]

Each element represents the weight of a feature in the original data. In other words, we can think of principal components as vectors used to linearly combine the original features. Each vector in the result is a unit vector.

We can also look at the amount of variance in the data explained by each component.

print(pca.explained_variance_ratio_)
Copy after login

This output will show the proportion of the variance in the data explained by each component.

[0.92461621 0.05301557]

We can see that these two components explain a total of 94% of the variance in the data. This means we can capture the characteristics of the data very accurately.

One thing to note is that PCA will remove all features from the original data. Therefore, if we need to retain certain features, we need to remove them manually before applying PCA.

This is an example of how to implement PCA using the sklearn library in Python. PCA can be applied to all types of data and helps us discover the most important components from high-dimensional data. If you can understand the code in this article, you will also be able to apply PCA on your own data sets.

The above is the detailed content of Principal component analysis example in Python. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Read CSV files and perform data analysis using pandas Read CSV files and perform data analysis using pandas Jan 09, 2024 am 09:26 AM

Pandas is a powerful data analysis tool that can easily read and process various types of data files. Among them, CSV files are one of the most common and commonly used data file formats. This article will introduce how to use Pandas to read CSV files and perform data analysis, and provide specific code examples. 1. Import the necessary libraries First, we need to import the Pandas library and other related libraries that may be needed, as shown below: importpandasaspd 2. Read the CSV file using Pan

Introduction to data analysis methods Introduction to data analysis methods Jan 08, 2024 am 10:22 AM

Common data analysis methods: 1. Comparative analysis method; 2. Structural analysis method; 3. Cross analysis method; 4. Trend analysis method; 5. Cause and effect analysis method; 6. Association analysis method; 7. Cluster analysis method; 8 , Principal component analysis method; 9. Scatter analysis method; 10. Matrix analysis method. Detailed introduction: 1. Comparative analysis method: Comparative analysis of two or more data to find the differences and patterns; 2. Structural analysis method: A method of comparative analysis between each part of the whole and the whole. ; 3. Cross analysis method, etc.

Usage of sqrt() function in Python Usage of sqrt() function in Python Feb 21, 2024 pm 03:09 PM

Usage and code examples of the sqrt() function in Python 1. Function and introduction of the sqrt() function In Python programming, the sqrt() function is a function in the math module, and its function is to calculate the square root of a number. The square root means that a number multiplied by itself equals the square of the number, that is, x*x=n, then x is the square root of n. The sqrt() function can be used in the program to calculate the square root. 2. How to use the sqrt() function in Python, sq

11 basic distributions that data scientists use 95% of the time 11 basic distributions that data scientists use 95% of the time Dec 15, 2023 am 08:21 AM

Following the last inventory of "11 Basic Charts Data Scientists Use 95% of the Time", today we will bring you 11 basic distributions that data scientists use 95% of the time. Mastering these distributions helps us understand the nature of the data more deeply and make more accurate inferences and predictions during data analysis and decision-making. 1. Normal Distribution Normal Distribution, also known as Gaussian Distribution, is a continuous probability distribution. It has a symmetrical bell-shaped curve with the mean (μ) as the center and the standard deviation (σ) as the width. The normal distribution has important application value in many fields such as statistics, probability theory, and engineering.

How to do image processing and recognition in Python How to do image processing and recognition in Python Oct 20, 2023 pm 12:10 PM

How to do image processing and recognition in Python Summary: Modern technology has made image processing and recognition an important tool in many fields. Python is an easy-to-learn and use programming language with rich image processing and recognition libraries. This article will introduce how to use Python for image processing and recognition, and provide specific code examples. Image processing: Image processing is the process of performing various operations and transformations on images to improve image quality, extract information from images, etc. PIL library in Python (Pi

11 Advanced Visualizations for Data Analysis and Machine Learning 11 Advanced Visualizations for Data Analysis and Machine Learning Oct 25, 2023 am 08:13 AM

Visualization is a powerful tool for communicating complex data patterns and relationships in an intuitive and understandable way. They play a vital role in data analysis, providing insights that are often difficult to discern from raw data or traditional numerical representations. Visualization is crucial for understanding complex data patterns and relationships, and we will introduce the 11 most important and must-know charts that help reveal the information in the data and make complex data more understandable and meaningful. 1. KSPlotKSPlot is used to evaluate distribution differences. The core idea is to measure the maximum distance between the cumulative distribution functions (CDF) of two distributions. The smaller the maximum distance, the more likely they belong to the same distribution. Therefore, it is mainly interpreted as a "system" for determining distribution differences.

Machine learning and data analysis using Go language Machine learning and data analysis using Go language Nov 30, 2023 am 08:44 AM

In today's intelligent society, machine learning and data analysis are indispensable tools that can help people better understand and utilize large amounts of data. In these fields, Go language has also become a programming language that has attracted much attention. Its speed and efficiency make it the choice of many programmers. This article introduces how to use Go language for machine learning and data analysis. 1. The ecosystem of machine learning Go language is not as rich as Python and R. However, as more and more people start to use it, some machine learning libraries and frameworks

How to use ECharts and php interfaces to implement data analysis and prediction of statistical charts How to use ECharts and php interfaces to implement data analysis and prediction of statistical charts Dec 17, 2023 am 10:26 AM

How to use ECharts and PHP interfaces to implement data analysis and prediction of statistical charts. Data analysis and prediction play an important role in various fields. They can help us understand the trends and patterns of data and provide references for future decisions. ECharts is an open source data visualization library that provides rich and flexible chart components that can dynamically load and process data by using the PHP interface. This article will introduce the implementation method of statistical chart data analysis and prediction based on ECharts and php interface, and provide

See all articles