使用 Streamlit 将机器学习模型部署为 Web 应用程序-Python教程-PHP中文网

介绍

机器学习模型本质上是一组用于进行预测或查找数据模式的规则或机制。简单地说（不用担心过度简化），在 Excel 中使用最小二乘法计算的趋势线也是一个模型。然而，实际应用中使用的模型并不那么简单——它们往往涉及更复杂的方程和算法，而不仅仅是简单的方程。

在这篇文章中，我将首先构建一个非常简单的机器学习模型，并将其作为一个非常简单的 Web 应用程序发布，以体验整个过程。

在这里，我将只关注流程，而不是 ML 模型本身。 Alsom 我将使用 Streamlit 和 Streamlit Community Cloud 轻松发布 Python Web 应用程序。

长话短说：

使用 scikit-learn（一种流行的机器学习 Python 库），您可以快速训练数据并创建模型，只需几行代码即可完成简单任务。然后可以使用 joblib 将模型保存为可重用文件。这个保存的模型可以像 Web 应用程序中的常规 Python 库一样导入/加载，从而允许应用程序使用经过训练的模型进行预测！

应用程序网址：https://yh-machine-learning.streamlit.app/
GitHub：https://github.com/yoshan0921/yh-machine-learning.git

技术栈

Python
Streamlit：用于创建 Web 应用程序界面。
scikit-learn：用于加载和使用预先训练的随机森林模型。
NumPy 和 Pandas：用于数据操作和处理。
Matplotlib 和 Seaborn：用于生成可视化。

我做了什么

此应用程序允许您检查在帕尔默企鹅数据集上训练的随机森林模型所做的预测。（有关训练数据的更多详细信息，请参阅本文末尾。）

具体来说，该模型根据各种特征来预测企鹅物种，包括物种、岛屿、喙长、鳍状肢长度、体型和性别。用户可以导航应用程序以查看不同的功能如何影响模型的预测。

预测屏幕
学习数据/可视化屏幕

开发步骤1 - 创建模型

Step1.1 导入库

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
import joblib

登录后复制

pandas 是一个专门用于数据操作和分析的 Python 库。它支持使用 DataFrame 进行数据加载、预处理和结构化，为机器学习模型准备数据。
sklearn 是一个用于机器学习的综合 Python 库，提供训练和评估工具。在这篇文章中，我将使用称为随机森林的学习方法构建一个模型。
joblib 是一个 Python 库，可以帮助以非常有效的方式保存和加载 Python 对象，例如机器学习模型。

Step1.2 读取数据

df = pd.read_csv("./dataset/penguins_cleaned.csv")
X_raw = df.drop("species", axis=1)
y_raw = df.species

登录后复制

加载数据集（训练数据）并将其分成特征（X）和目标变量（y）。

Step1.3 对类别变量进行编码

encode = ["island", "sex"]
X_encoded = pd.get_dummies(X_raw, columns=encode)

target_mapper = {"Adelie": 0, "Chinstrap": 1, "Gentoo": 2}
y_encoded = y_raw.apply(lambda x: target_mapper[x])

登录后复制

使用 one-hot 编码（X_encoded）将分类变量转换为数字格式。例如，如果“island”包含类别“Biscoe”、“Dream”和“Torgersen”，则会为每个类别创建一个新列（island_Biscoe、island_Dream、island_Torgersen）。对于性也是如此。如果原始数据是“Biscoe”，则 island_Biscoe 列将设置为 1，其他列将设置为 0。
目标变量物种映射为数值（y_encoded）。

Step1.4 分割数据集

x_train, x_test, y_train, y_test = train_test_split(
    X_encoded, y_encoded, test_size=0.3, random_state=1
)

登录后复制

为了评估模型，有必要测量模型在未用于训练的数据上的性能。 7:3 被广泛用作机器学习中的一般实践。

Step1.5 训练随机森林模型

clf = RandomForestClassifier()
clf.fit(x_train, y_train)

登录后复制

fit方法用于训练模型。
x_train 表示解释变量的训练数据，y_train 表示目标变量。
通过调用该方法，根据训练数据训练出的模型存储在clf中。

Step1.6 保存模型

joblib.dump(clf, "penguin_classifier_model.pkl")

登录后复制

joblib.dump() 是一个以二进制格式保存 Python 对象的函数。通过以此格式保存模型，可以从文件加载模型并按原样使用，而无需再次训练。

示例代码

Development Step2 - Building the Web App and Integrating the Model

Step2.1 Import Libraries

import streamlit as st
import numpy as np
import pandas as pd
import joblib

登录后复制

stremlit is a Python library that makes it easy to create and share custom web applications for machine learning and data science projects.
numpy is a fundamental Python library for numerical computing. It provides support for large, multi-dimensional arrays and matrices, along with a collection of mathematical functions to operate on these arrays efficiently.

Step2.2 Retrieve and encode input data

data = {
    "island": island,
    "bill_length_mm": bill_length_mm,
    "bill_depth_mm": bill_depth_mm,
    "flipper_length_mm": flipper_length_mm,
    "body_mass_g": body_mass_g,
    "sex": sex,
}
input_df = pd.DataFrame(data, index=[0])

encode = ["island", "sex"]
input_encoded_df = pd.get_dummies(input_df, prefix=encode)

登录后复制

Input values are retrieved from the input form created by Stremlit, and categorical variables are encoded using the same rules as when the model was created. Note that the order of each data must also be the same as when the model was created. If the order is different, an error will occur when executing a forecast using the model.

Step2.3 Load the Model

clf = joblib.load("penguin_classifier_model.pkl")

登录后复制

"penguin_classifier_model.pkl" is the file where the previously saved model is stored. This file contains a trained RandomForestClassifier in binary format. Running this code loads the model into clf, allowing you to use it for predictions and evaluations on new data.

Step2.4 Perform prediction

prediction = clf.predict(input_encoded_df)
prediction_proba = clf.predict_proba(input_encoded_df)

登录后复制

clf.predict(input_encoded_df): Uses the trained model to predict the class for the new encoded input data, storing the result in prediction.
clf.predict_proba(input_encoded_df): Calculates the probability for each class, storing the results in prediction_proba.

Sample Code

Step3. Deploy

Machine Learning Model Deployment as a Web App using Streamlit

You can publish your developed application on the Internet by accessing the Stremlit Community Cloud (https://streamlit.io/cloud) and specifying the URL of the GitHub repository.

About Data Set

Machine Learning Model Deployment as a Web App using Streamlit

Artwork by @allison_horst (https://github.com/allisonhorst)

The model is trained using the Palmer Penguins dataset, a widely recognized dataset for practicing machine learning techniques. This dataset provides information on three penguin species (Adelie, Chinstrap, and Gentoo) from the Palmer Archipelago in Antarctica. Key features include:

Species: The species of the penguin (Adelie, Chinstrap, Gentoo).
Island: The specific island where the penguin was observed (Biscoe, Dream, Torgersen).
Bill Length: The length of the penguin's bill (mm).
Bill Depth: The depth of the penguin's bill (mm).
Flipper Length: The length of the penguin's flipper (mm).
Body Mass: The mass of the penguin (g).
Sex: The sex of the penguin (male or female).

This dataset is sourced from Kaggle, and it can be accessed here. The diversity in features makes it an excellent choice for building a classification model and understanding the importance of each feature in species prediction.

以上是使用 Streamlit 将机器学习模型部署为 Web 应用程序的详细内容。更多信息请关注PHP中文网其他相关文章！