Nine super useful Python libraries for data science-Python Tutorial-php.cn

Table of Contents

1. Wget

2. Pendulum

3. imbalanced-learn

4. FlashText

5. fuzzywuzzy

6. PyFlux

7. Ipyvolume

8. Dash

九、Gym

总结

Home

Backend Development

Python Tutorial

Nine super useful Python libraries for data science

PHPz

Apr 17, 2023 am 09:25 AM

python programming language develop

In this article, we will look at some Python libraries for data science tasks, instead of the more common libraries such as panda, scikit-learn, and matplotlib. Although libraries like panda and scikit-learn are commonly used in machine learning tasks, it is always beneficial to understand other Python products in this field.

1. Wget

Extracting data from the Internet is one of the important tasks of a data scientist. Wget is a free utility that can be used to download non-interactive files from the Internet. It supports HTTP, HTTPS, and FTP protocols, as well as file retrieval through HTTP's proxy. Since it is non-interactive, it can work in the background even if the user is not logged in. So next time you want to download all the images on a website or a page, wget can help you.

Installation:

$ pip install wget

Copy after login

Example:

import wget
url = 'http://www.futurecrew.com/skaven/song_files/mp3/razorback.mp3'

filename = wget.download(url)
100% [................................................] 3841532 / 3841532

filename
'razorback.mp3'

Copy after login

2. Pendulum

For those who get frustrated when dealing with date and time in python, Pendulum is for you. It is a Python package that simplifies datetime operations. It is a simple replacement for Python's native classes. See the documentation for deeper learning.

Installation:

$ pip install pendulum

Copy after login

Example:

import pendulum

dt_toronto = pendulum.datetime(2012, 1, 1, tz='America/Toronto')
dt_vancouver = pendulum.datetime(2012, 1, 1, tz='America/Vancouver')

print(dt_vancouver.diff(dt_toronto).in_hours())

3

Copy after login

3. imbalanced-learn

It can be seen that when the number of samples in each class is basically the same, Most classification algorithms work best when the data needs to be balanced. However, most of the real-life cases are imbalanced data sets, which have a great impact on the learning phase and subsequent predictions of the machine learning algorithm. Fortunately, this library is designed to solve this problem. It is compatible with scikit-learn and is part of the scikit-lear-contrib project. Try using this the next time you encounter an unbalanced data set.

Installation:

$ pip install -U imbalanced-learn

# 或者

$ conda install -c conda-forge imbalanced-learn

Copy after login

Example:

Please refer to the document for usage methods and examples.

4. FlashText

In NLP tasks, cleaning text data often requires replacing keywords in sentences or extracting keywords from sentences. Typically this can be done using regular expressions, but this can become cumbersome if the number of terms being searched runs into the thousands. Python's FlashText module is based on the FlashText algorithm and provides a suitable alternative for this situation. The great thing about FlashText is that the run time is the same regardless of the number of search terms. You can learn more here.

Installation:

$ pip install flashtext

Copy after login

Example:

Extract keywords

from flashtext import KeywordProcessor
keyword_processor = KeywordProcessor()

# keyword_processor.add_keyword(<unclean name>, <standardised name>)

keyword_processor.add_keyword('Big Apple', 'New York')
keyword_processor.add_keyword('Bay Area')
keywords_found = keyword_processor.extract_keywords('I love Big Apple and Bay Area.')

keywords_found
['New York', 'Bay Area']

Copy after login

Replace keywords

keyword_processor.add_keyword('New Delhi', 'NCR region')

new_sentence = keyword_processor.replace_keywords('I love Big Apple and new delhi.')

new_sentence
'I love New York and NCR region.'
Fuzzywuzzy

Copy after login

5. fuzzywuzzy

The name of this library sounds strange, but fuzzywuzzy is a very useful library when it comes to string matching. Operations such as calculating string matching degree and token matching degree can be easily implemented, and records stored in different databases can also be matched easily.

Installation:

$ pip install fuzzywuzzy

Copy after login

Examples:

from fuzzywuzzy import fuzz
from fuzzywuzzy import process

# 简单匹配度

fuzz.ratio("this is a test", "this is a test!")
97

# 模糊匹配度
fuzz.partial_ratio("this is a test", "this is a test!")
 100

Copy after login

More interesting examples can be found in the GitHub repository.

6. PyFlux

Time series analysis is one of the most common problems in the field of machine learning. PyFlux is an open source library in Python built for working with time series problems. The library has an excellent collection of modern time series models, including but not limited to ARIMA, GARCH, and VAR models. In short, PyFlux provides a probabilistic approach to time series modeling. Worth trying.

Installation

pip install pyflux

Copy after login

Example

Please refer to the official documentation for detailed usage and examples.

7. Ipyvolume

Result display is also an important aspect in data science. Being able to visualize the results will be a great advantage. IPyvolume is a Python library that can visualize 3D volumes and graphics (such as 3D scatter plots, etc.) in Jupyter notebooks and requires only a small amount of configuration. But it is still in the pre-1.0 version stage. A more appropriate metaphor to explain is: IPyvolume's volshow is as useful for three-dimensional arrays as matplotlib's imshow is for two-dimensional arrays. More available here.

Using pip

$ pip install ipyvolume

Copy after login

Using Conda/Anaconda

$ conda install -c conda-forge ipyvolume

Copy after login

Example

Animation

Nine super useful Python libraries for data science

body Draw

Nine super useful Python libraries for data science

8. Dash

Dash is an efficient Python framework for building web applications. It is designed based on Flask, Plotly.js and React.js, and is bound to many modern UI elements such as drop-down boxes, sliders and charts. You can directly use Python code to write relevant analysis without having to Use javascript. Dash is great for building data visualization applications. These applications can then be rendered in a web browser. The user guide is available here.

Installation

pip install dash==0.29.0# 核心 dash 后端
pip install dash-html-components==0.13.2# HTML 组件
pip install dash-core-components==0.36.0# 增强组件
pip install dash-table==3.1.3# 交互式 DataTable 组件（最新！）

Copy after login

Example The following example shows a highly interactive chart with drop-down functionality. When the user selects a value in the dropdown menu, the application code dynamically exports data from Google Finance to a panda DataFrame.

Nine super useful Python libraries for data science

九、Gym

OpenAI 的 Gym 是一款用于增强学习算法的开发和比较工具包。它兼容任何数值计算库，如 TensorFlow 或 Theano。Gym 库是测试问题集合的必备工具，这个集合也称为环境 —— 你可以用它来开发你的强化学习算法。这些环境有一个共享接口，允许你进行通用算法的编写。

安装

pip install gym

Copy after login

例子这个例子会运行CartPole-v0环境中的一个实例，它的时间步数为 1000，每一步都会渲染整个场景。

总结

以上这些有用的数据科学 Python 库都是我精心挑选出来的，不是常见的如 numpy 和 pandas 等库。如果你知道其它库，可以添加到列表中来，请在下面的评论中提一下。另外别忘了先尝试运行一下它们。

The above is the detailed content of Nine super useful Python libraries for data science. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)

2 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Repo: How To Revive Teammates

4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Hello Kitty Island Adventure: How To Get Giant Seeds

3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

How Long Does It Take To Beat Split Fiction?

3 weeks ago By DDD

R.E.P.O. Save File Location: Where Is It & How to Protect It?

3 weeks ago By DDD

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Where is the login entrance for gmail email?

7316

Java Tutorial

1625

CakePHP Tutorial

1349

Laravel Tutorial

1261

PHP Tutorial

1208

Related knowledge

How to download deepseek Xiaomi Feb 19, 2025 pm 05:27 PM

How to download DeepSeek Xiaomi? Search for "DeepSeek" in the Xiaomi App Store. If it is not found, continue to step 2. Identify your needs (search files, data analysis), and find the corresponding tools (such as file managers, data analysis software) that include DeepSeek functions.

How do you ask him deepseek Feb 19, 2025 pm 04:42 PM

The key to using DeepSeek effectively is to ask questions clearly: express the questions directly and specifically. Provide specific details and background information. For complex inquiries, multiple angles and refute opinions are included. Focus on specific aspects, such as performance bottlenecks in code. Keep a critical thinking about the answers you get and make judgments based on your expertise.

How to search deepseek Feb 19, 2025 pm 05:18 PM

Just use the search function that comes with DeepSeek. Its powerful semantic analysis algorithm can accurately understand the search intention and provide relevant information. However, for searches that are unpopular, latest information or problems that need to be considered, it is necessary to adjust keywords or use more specific descriptions, combine them with other real-time information sources, and understand that DeepSeek is just a tool that requires active, clear and refined search strategies.

How to program deepseek Feb 19, 2025 pm 05:36 PM

DeepSeek is not a programming language, but a deep search concept. Implementing DeepSeek requires selection based on existing languages. For different application scenarios, it is necessary to choose the appropriate language and algorithms, and combine machine learning technology. Code quality, maintainability, and testing are crucial. Only by choosing the right programming language, algorithms and tools according to your needs and writing high-quality code can DeepSeek be successfully implemented.

How to use deepseek to settle accounts Feb 19, 2025 pm 04:36 PM

Question: Is DeepSeek available for accounting? Answer: No, it is a data mining and analysis tool that can be used to analyze financial data, but it does not have the accounting record and report generation functions of accounting software. Using DeepSeek to analyze financial data requires writing code to process data with knowledge of data structures, algorithms, and DeepSeek APIs to consider potential problems (e.g. programming knowledge, learning curves, data quality)

The Key to Coding: Unlocking the Power of Python for Beginners Oct 11, 2024 pm 12:17 PM

Python is an ideal programming introduction language for beginners through its ease of learning and powerful features. Its basics include: Variables: used to store data (numbers, strings, lists, etc.). Data type: Defines the type of data in the variable (integer, floating point, etc.). Operators: used for mathematical operations and comparisons. Control flow: Control the flow of code execution (conditional statements, loops).

Problem-Solving with Python: Unlock Powerful Solutions as a Beginner Coder Oct 11, 2024 pm 08:58 PM

Pythonempowersbeginnersinproblem-solving.Itsuser-friendlysyntax,extensivelibrary,andfeaturessuchasvariables,conditionalstatements,andloopsenableefficientcodedevelopment.Frommanagingdatatocontrollingprogramflowandperformingrepetitivetasks,Pythonprovid

How to access DeepSeekapi - DeepSeekapi access call tutorial Mar 12, 2025 pm 12:24 PM

Detailed explanation of DeepSeekAPI access and call: Quick Start Guide This article will guide you in detail how to access and call DeepSeekAPI, helping you easily use powerful AI models. Step 1: Get the API key to access the DeepSeek official website and click on the "Open Platform" in the upper right corner. You will get a certain number of free tokens (used to measure API usage). In the menu on the left, click "APIKeys" and then click "Create APIkey". Name your APIkey (for example, "test") and copy the generated key right away. Be sure to save this key properly, as it will only be displayed once

See all articles