


How to use scikit-learn machine learning library in Python.
Preface
scikit-learn is one of the most popular machine learning libraries in Python. It provides a variety of machine learning algorithms and tools, including classification, regression, clustering, dimensionality reduction, etc. .
The advantages of scikit-learn are:
Easy to use: The interface of scikit-learn is simple and easy to understand, allowing users to easily get started with machine learning. Unified API: The API of scikit-learn is very unified, and the methods of using various algorithms are basically the same, making learning and use more convenient.
Implements a large number of machine learning algorithms: scikit-learn implements various classic machine learning algorithms, and provides a wealth of tools and functions, making algorithm debugging and optimization more convenient. easy.
Open source and free: scikit-learn is completely open source and free, and anyone can use and modify its code.
Efficient and stable: scikit-learn implements various efficient machine learning algorithms, can handle large-scale data sets, and performs well in terms of stability and reliability. scikit-learn is very suitable for entry-level machine learning because the API is very unified and the model is relatively simple. My recommendation here is to study in conjunction with the official documentation, which not only introduces the scope of application of each model but also provides code samples.
Linear Regression Model-LinearRegression
The LinearRegression model is a model based on linear regression and is suitable for solving prediction problems of continuous variables. The basic idea of this model is to establish a linear equation, model the relationship between the independent variable and the dependent variable as a straight line, and use the training data to fit the straight line to find the coefficients of the linear equation, and then use this equation to test data for prediction.
LinearRegression model is suitable for problems where there is a linear relationship between independent variables and dependent variables, such as housing price prediction, sales prediction, user behavior prediction, etc. Of course, when the relationship between the independent variable and the dependent variable is nonlinear, the performance of the LinearRegression model will be poor. At this time, polynomial regression, ridge regression, Lasso regression and other methods can be used to solve the problem.
Prepare the data set
After putting aside the influence of other factors, there is a certain linear relationship between learning time and learning performance. Of course, the learning time here refers to the effective learning time, performance As the study time increases, the grades will also increase. So we prepare a data set of study time and grades. Part of the data in the data set is as follows:
Learning time, score
0.5,15
0.75,23
1.0,14
1.25,42
1.5,21
1.75,28
1.75,35
2.0,51
2.25,61
2.5,49
Use LinearRegression
to determine the feature sum Goal
Between study time and grades, study time is the feature, which is the independent variable; grade is the label, which is the dependent variable, so we need to extract features and labels from the prepared study time and grade data set.
import pandas as pd import numpy as np from sklearn.metrics import r2_score, mean_squared_error from sklearn.model_selection import train_test_split from sklearn.linear_model import LinearRegression # 读取学习时间和成绩CSV数据文件 data = pd.read_csv('data/study_time_score.csv') # 提取数据特征学习时间 X = data['学习时间'] # 提取数据目标(标签)分数 Y = data['分数']
Divide the training set and the test set
After the feature and label data are prepared, use scikit-learn's LinearRegression for training and divide the data set into a training set and a test set.
""" 将特征数据和目标数据划分为测试集和训练集 通过test_size=0.25将百分之二十五的数据划分为测试集 """ X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.25, random_state=0) x_train = X_train.values.reshape(-1, 1) model.fit(x_train, Y_train)
Select the model and fit the data
After preparing the test set and training set, we can choose the appropriate model to fit the training set so that we can predict other The target corresponding to the feature
# 选择模型,选择模型为LinearRegression model = LinearRegression() # Scikit-learn中,机器学习模型的输入必须是一个二维数组。我们需要将一维数组转换为二维数组,才能在模型中使用。 x_train = X_train.values.reshape(-1, 1) # 进行拟合 model.fit(x_train, Y_train)
Get the model parameters
Since the data set only contains two learning time and grades, it is a very simple linear model, and the mathematical formula behind it is y=ax b , where the y dependent variable is grades, and the x independent variable is study time.
""" 输出模型关键参数 Intercept: 截距 即b Coefficients: 变量权重 即a """ print('Intercept:', model.intercept_) print('Coefficients:', model.coef_)
Backtest
The above fitting model only uses the test set data. Next, we need to use the test set data to conduct a backtest on the fitting of the model. After using the training set to fit, , we can predict the feature test set, and by comparing the obtained target prediction results with the actual target values, we can obtain the fitting degree of the model.
# 转换为n行1列的二维数组 x_test = X_test.values.reshape(-1, 1) # 在测试集上进行预测并计算评分 Y_pred = model.predict(x_test) # 打印测试特征数据 print(x_test) # 打印特征数据对应的预测结果 print(Y_pred) # 将预测结果与原特征数据对应的实际目标值进行比较,从而获得模型拟合度 # R2 (R-squared):模型拟合优度,取值范围在0~1之间,越接近1表示模型越好的拟合了数据。 print("R2:", r2_score(Y_test, Y_pred))
Program running results
According to the above code, we need to determine the fitting degree of the LinearRegression model, that is, whether the data is suitable or not. Use a linear model for fitting. The running results of the program are as follows:
##Prediction results:[47.43726068 33.05457106 49.83437561 63.41802692 41.84399249 37.84880093
23.46611131 37. 84880093 26.66226456 71.40841004 18.67188144 88.9872529
63.41802692 42.6430308 21.86803469 69.81033341 66.61418017 33.05457106
58.62379705 50.63341392 18.67188144 41.044954 0 .8935675710322939
The above is the detailed content of How to use scikit-learn machine learning library in Python.. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics



PHP and Python have their own advantages and disadvantages, and the choice depends on project needs and personal preferences. 1.PHP is suitable for rapid development and maintenance of large-scale web applications. 2. Python dominates the field of data science and machine learning.

Python and JavaScript have their own advantages and disadvantages in terms of community, libraries and resources. 1) The Python community is friendly and suitable for beginners, but the front-end development resources are not as rich as JavaScript. 2) Python is powerful in data science and machine learning libraries, while JavaScript is better in front-end development libraries and frameworks. 3) Both have rich learning resources, but Python is suitable for starting with official documents, while JavaScript is better with MDNWebDocs. The choice should be based on project needs and personal interests.

Docker uses Linux kernel features to provide an efficient and isolated application running environment. Its working principle is as follows: 1. The mirror is used as a read-only template, which contains everything you need to run the application; 2. The Union File System (UnionFS) stacks multiple file systems, only storing the differences, saving space and speeding up; 3. The daemon manages the mirrors and containers, and the client uses them for interaction; 4. Namespaces and cgroups implement container isolation and resource limitations; 5. Multiple network modes support container interconnection. Only by understanding these core concepts can you better utilize Docker.

In VS Code, you can run the program in the terminal through the following steps: Prepare the code and open the integrated terminal to ensure that the code directory is consistent with the terminal working directory. Select the run command according to the programming language (such as Python's python your_file_name.py) to check whether it runs successfully and resolve errors. Use the debugger to improve debugging efficiency.

VS Code can be used to write Python and provides many features that make it an ideal tool for developing Python applications. It allows users to: install Python extensions to get functions such as code completion, syntax highlighting, and debugging. Use the debugger to track code step by step, find and fix errors. Integrate Git for version control. Use code formatting tools to maintain code consistency. Use the Linting tool to spot potential problems ahead of time.

Python excels in automation, scripting, and task management. 1) Automation: File backup is realized through standard libraries such as os and shutil. 2) Script writing: Use the psutil library to monitor system resources. 3) Task management: Use the schedule library to schedule tasks. Python's ease of use and rich library support makes it the preferred tool in these areas.

VS Code can run on Windows 8, but the experience may not be great. First make sure the system has been updated to the latest patch, then download the VS Code installation package that matches the system architecture and install it as prompted. After installation, be aware that some extensions may be incompatible with Windows 8 and need to look for alternative extensions or use newer Windows systems in a virtual machine. Install the necessary extensions to check whether they work properly. Although VS Code is feasible on Windows 8, it is recommended to upgrade to a newer Windows system for a better development experience and security.

VS Code extensions pose malicious risks, such as hiding malicious code, exploiting vulnerabilities, and masturbating as legitimate extensions. Methods to identify malicious extensions include: checking publishers, reading comments, checking code, and installing with caution. Security measures also include: security awareness, good habits, regular updates and antivirus software.
