Eight Python libraries that can boost your data science productivity and save valuable time-Python Tutorial-php.cn

Table of Contents

1. Optuna

2. ITMO_FS

3. shap-hypetune

4. PyCaret

5. floWeaver

6、Gradio

7、Terality

8、torch-handle

Home

Backend Development

Python Tutorial

Eight Python libraries that can boost your data science productivity and save valuable time

PHPz

Apr 12, 2023 pm 05:01 PM

python develop data science

When doing data science, you can waste a lot of time coding and waiting for your computer to run something. So I have chosen some Python libraries that can help you save your precious time.

1. Optuna

Optuna is an open source hyperparameter optimization framework that can automatically find the best hyperparameters for machine learning models.

The most basic (and probably well-known) alternative is sklearn's GridSearchCV, which will try multiple hyperparameter combinations and choose the best one based on cross-validation.

GridSearchCV will attempt combinations within the previously defined space. For example, for a random forest classifier, you might want to test the maximum depth of several different trees. GridSearchCV provides all possible values for each hyperparameter and looks at all combinations.

Optuna will use its own history of attempts in the defined search space to determine the values to try next. The method it uses is a Bayesian optimization algorithm called "Tree-structured Parzen Estimator".

This different approach means that instead of pointlessly trying every value, it looks for the best candidate before trying it, saving time that would otherwise be spent trying hopelessly alternatives (and may also yield better results).

Finally, it is framework agnostic, which means you can use it with TensorFlow, Keras, PyTorch, or any other ML framework.

2. ITMO_FS

ITMO_FS is a feature selection library that can perform feature selection for ML models. The fewer observations you have, the more careful you need to be with too many features to avoid overfitting. By "prudent" I mean you should standardize your model. Usually a simpler model (fewer features) is easier to understand and interpret.

ITMO_FS algorithms are divided into 6 different categories: supervised filters, unsupervised filters, wrappers, hybrids, embedded, ensembles (although it mainly focuses on supervised filters).

A simple example of a "supervised filter" algorithm is to select features based on their correlation with a target variable. With "backward selection", you can try to remove features one by one and confirm how these features affect the model's predictive ability.

Here's a trivial example of how to use ITMO_FS and its impact on model scores:

>>> from sklearn.linear_model import SGDClassifier
>>> from ITMO_FS.embedded import MOS
>>> X, y = make_classification(n_samples=300, n_features=10, random_state=0, n_informative=2)
>>> sel = MOS()
>>> trX = sel.fit_transform(X, y, smote=False)
>>> cl1 = SGDClassifier()
>>> cl1.fit(X, y)
>>> cl1.score(X, y)
0.9033333333333333
>>> cl2 = SGDClassifier()
>>> cl2.fit(trX, y)
>>> cl2.score(trX, y)
0.9433333333333334

Copy after login

ITMO_FS is a relatively new library, so it's still a bit unstable, but I still It is recommended to try it.

3. shap-hypetune

So far we have seen libraries for feature selection and hyperparameter tuning, but why not use both at the same time? This is shap-hypetune role.

Let’s start by understanding what “SHAP” is:

“SHAP (SHapley Additive exPlanations) is a game theory method for explaining the output of any machine learning model.”

SHAP is one of the most widely used libraries for interpreting models, and it works by generating the importance of each feature to the model's final prediction.

On the other hand, shap-hypertune benefits from this approach to select the best features, but also the best hyperparameters. Why do you want to combine them together? Selecting features and tuning hyperparameters independently may lead to suboptimal choices because without taking into account their interactions. Doing both at the same time not only takes this into account, but also saves some coding time (although it may increase runtime due to the increased search space).

Search can be done in 3 ways: grid search, random search or Bayesian search (plus, it can be parallelized).

However, shap-hypertune only works with gradient boosting models!

4. PyCaret

PyCaret is an open source, low-code machine learning library that can automatically perform machine learning Workflow. It covers exploratory data analysis, preprocessing, modeling (including interpretability), and MLOps.

Let’s take a look at some practical examples on their website to see how it works:

# load dataset
from pycaret.datasets import get_data
diabetes = get_data('diabetes')
# init setup
from pycaret.classification import *
clf1 = setup(data = diabetes, target = 'Class variable')
# compare models
best = compare_models()

Copy after login

Eight Python libraries that can boost your data science productivity and save valuable time

With just a few lines of code, you’re good to go Multiple models were tried and compared across major classification metrics.

It also allows the creation of a basic application to interact with the model:

from pycaret.datasets import get_data
juice = get_data('juice')
from pycaret.classification import *
exp_name = setup(data = juice, target = 'Purchase')
lr = create_model('lr')
create_app(lr)

Copy after login

Finally, API and Docker files can be easily created for the model:

from pycaret.datasets import get_data
juice = get_data('juice')
from pycaret.classification import *
exp_name = setup(data = juice, target = 'Purchase')
lr = create_model('lr')
create_api(lr, 'lr_api')
create_docker('lr_api')

Copy after login

Nothing like It's easier, right?

PyCaret is a very complete library and it's difficult to cover everything here, I recommend you download it now and start using it to understand some of its capabilities in practice.

5. floWeaver

FloWeaver can generate Sankey diagrams from streaming data sets. If you don’t know what a Sankey diagram is, here’s an example:

Eight Python libraries that can boost your data science productivity and save valuable time

They are very useful when showing data for conversion funnels, marketing journeys, or budget allocations (example above ). The portal data should be in the following format: "source x target x value" It only takes one line of code to create such a plot (very specific, but also very intuitive).

6、Gradio

如果你阅读过敏捷数据科学，就会知道拥有一个让最终用户从项目开始就与数据进行交互的前端界面是多么有帮助。一般情况下在Python中最常用是 Flask，但它对初学者不太友好，它需要多个文件和一些 html、css 等知识。

Gradio 允许您通过设置输入类型(文本、复选框等)、功能和输出来创建简单的界面。尽管它似乎不如 Flask 可定制，但它更直观。

由于 Gradio 现在已经加入 Huggingface，可以在互联网上永久托管 Gradio 模型，而且是免费的!

7、Terality

理解 Terality 的最佳方式是将其视为“Pandas ，但速度更快”。这并不意味着完全替换 pandas 并且必须重新学习如何使用df：Terality 与 Pandas 具有完全相同的语法。实际上，他们甚至建议“import Terality as pd”，并继续按照以前的习惯的方式进行编码。

它快多少?他们的网站有时会说它快 30 倍，有时快 10 到 100 倍。

另一个重要是 Terality 允许并行化并且它不在本地运行，这意味着您的 8GB RAM 笔记本电脑将不会再出现 MemoryErrors!

但它在背后是如何运作的呢?理解 Terality 的一个很好的比喻是可以认为他们在本地使用的 Pandas 兼容的语法并编译成 Spark 的计算操作，使用Spark进行后端的计算。所以计算不是在本地运行，而是将计算任务提交到了他们的平台上。

那有什么问题呢?每月最多只能免费处理 1TB 的数据。如果需要更多则必须每月至少支付 49 美元。 1TB/月对于测试工具和个人项目可能绰绰有余，但如果你需要它来实际公司使用，肯定是要付费的。

8、torch-handle

如果你是Pytorch的使用者，可以试试这个库。

torchhandle是一个PyTorch的辅助框架。它将PyTorch繁琐和重复的训练代码抽象出来，使得数据科学家们能够将精力放在数据处理、创建模型和参数优化，而不是编写重复的训练循环代码。使用torchhandle，可以让你的代码更加简洁易读，让你的开发任务更加高效。

torchhandle将Pytorch的训练和推理过程进行了抽象整理和提取，只要使用几行代码就可以实现PyTorch的深度学习管道。并可以生成完整训练报告，还可以集成tensorboard进行可视化。

from collections import OrderedDict
import torch
from torchhandle.workflow import BaseContext
class Net(torch.nn.Module):
def __init__(self, ):
super().__init__()
self.layer = torch.nn.Sequential(OrderedDict([
('l1', torch.nn.Linear(10, 20)),
('a1', torch.nn.ReLU()),
('l2', torch.nn.Linear(20, 10)),
('a2', torch.nn.ReLU()),
('l3', torch.nn.Linear(10, 1))
]))
def forward(self, x):
x = self.layer(x)
return x

num_samples, num_features = int(1e4), int(1e1)
X, Y = torch.rand(num_samples, num_features), torch.rand(num_samples)
dataset = torch.utils.data.TensorDataset(X, Y)
trn_loader = torch.utils.data.DataLoader(dataset, batch_size=64, num_workers=0, shuffle=True)
loaders = {"train": trn_loader, "valid": trn_loader}
device = 'cuda' if torch.cuda.is_available() else 'cpu'
model = {"fn": Net}
criterion = {"fn": torch.nn.MSELoss}
optimizer = {"fn": torch.optim.Adam,
"args": {"lr": 0.1},
"params": {"layer.l1.weight": {"lr": 0.01},
"layer.l1.bias": {"lr": 0.02}}
}
scheduler = {"fn": torch.optim.lr_scheduler.StepLR,
"args": {"step_size": 2, "gamma": 0.9}
}
c = BaseContext(model=model,
criterion=criterion,
optimizer=optimizer,
scheduler=scheduler,
context_tag="ex01")
train = c.make_train_session(device, dataloader=loaders)
train.train(epochs=10)

Copy after login

定义一个模型，设置数据集，配置优化器、损失函数就可以自动训练了，是不是和TF差不多了。

The above is the detailed content of Eight Python libraries that can boost your data science productivity and save valuable time. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)

2 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Repo: How To Revive Teammates

1 months ago By 尊渡假赌尊渡假赌尊渡假赌

Hello Kitty Island Adventure: How To Get Giant Seeds

1 months ago By 尊渡假赌尊渡假赌尊渡假赌

How Long Does It Take To Beat Split Fiction?

4 weeks ago By DDD

R.E.P.O. Save File Location: Where Is It & How to Protect It?

4 weeks ago By DDD

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Where is the login entrance for gmail email?

7360

Java Tutorial

1628

CakePHP Tutorial

1353

Laravel Tutorial

1265

PHP Tutorial

1214

Related knowledge

How to beautify the XML format Apr 02, 2025 pm 09:57 PM

XML beautification is essentially improving its readability, including reasonable indentation, line breaks and tag organization. The principle is to traverse the XML tree, add indentation according to the level, and handle empty tags and tags containing text. Python's xml.etree.ElementTree library provides a convenient pretty_xml() function that can implement the above beautification process.

How to open xml format Apr 02, 2025 pm 09:00 PM

Use most text editors to open XML files; if you need a more intuitive tree display, you can use an XML editor, such as Oxygen XML Editor or XMLSpy; if you process XML data in a program, you need to use a programming language (such as Python) and XML libraries (such as xml.etree.ElementTree) to parse.

Does XML modification require programming? Apr 02, 2025 pm 06:51 PM

Modifying XML content requires programming, because it requires accurate finding of the target nodes to add, delete, modify and check. The programming language has corresponding libraries to process XML and provides APIs to perform safe, efficient and controllable operations like operating databases.

Is there any mobile app that can convert XML into PDF? Apr 02, 2025 pm 08:54 PM

An application that converts XML directly to PDF cannot be found because they are two fundamentally different formats. XML is used to store data, while PDF is used to display documents. To complete the transformation, you can use programming languages and libraries such as Python and ReportLab to parse XML data and generate PDF documents.

Is there a free XML to PDF tool for mobile phones? Apr 02, 2025 pm 09:12 PM

There is no simple and direct free XML to PDF tool on mobile. The required data visualization process involves complex data understanding and rendering, and most of the so-called "free" tools on the market have poor experience. It is recommended to use computer-side tools or use cloud services, or develop apps yourself to obtain more reliable conversion effects.

Is the conversion speed fast when converting XML to PDF on mobile phone? Apr 02, 2025 pm 10:09 PM

The speed of mobile XML to PDF depends on the following factors: the complexity of XML structure. Mobile hardware configuration conversion method (library, algorithm) code quality optimization methods (select efficient libraries, optimize algorithms, cache data, and utilize multi-threading). Overall, there is no absolute answer and it needs to be optimized according to the specific situation.

How to convert XML files to PDF on your phone? Apr 02, 2025 pm 10:12 PM

It is impossible to complete XML to PDF conversion directly on your phone with a single application. It is necessary to use cloud services, which can be achieved through two steps: 1. Convert XML to PDF in the cloud, 2. Access or download the converted PDF file on the mobile phone.

How to convert XML to PDF on your phone? Apr 02, 2025 pm 10:18 PM

It is not easy to convert XML to PDF directly on your phone, but it can be achieved with the help of cloud services. It is recommended to use a lightweight mobile app to upload XML files and receive generated PDFs, and convert them with cloud APIs. Cloud APIs use serverless computing services, and choosing the right platform is crucial. Complexity, error handling, security, and optimization strategies need to be considered when handling XML parsing and PDF generation. The entire process requires the front-end app and the back-end API to work together, and it requires some understanding of a variety of technologies.

See all articles