Some tips for implementing data type conversion in Pandas
This article mainly introduces some techniques on data type conversion in Pandas, which has certain reference value. Now I share it with you. Friends in need can refer to it
Preface
Pandas is an important data analysis tool in Python. When using Pandas for data analysis, it is very important to ensure that the correct data types are used, otherwise some unpredictable errors may occur. .
Data types in Pandas: Data types are essentially the internal structures that programming languages use to understand how to store and manipulate data. For example, a program needs to understand that you can add two numbers, such as 5 10 to get 15. Or, if it's two strings, such as "cat" and "hat", you can concatenate (add) them to get "cathat". Shangxuetang·Baizhan programmer Mr. Chen pointed out that one potentially confusing thing about Pandas data types is that there is some overlap between the data types of Pandas, Python and numpy.
Most of the time, you don't have to worry about whether you should explicitly cast a pandas type to the corresponding NumPy type. Generally speaking, you can use Pandas' default int64 and float64. The only reason I include this table is that sometimes you might see Numpy types between lines of code or during your own analysis.
Data types are one of those things that you don't care about until you encounter an error or unexpected result. But it’s also the first thing you should check when loading new data into Pandas for further analysis.
The author has been using Pandas for some time, but I still make mistakes on some minor issues. Tracing back to the source, I found that some feature columns are not of the type that Pandas can handle when operating on data. Therefore, this article will discuss some tips on how to convert Python's basic data types into data types that Pandas can handle.
The data types supported by Pandas, Numpy and Python
From the above table It can be seen that Pandas supports the most abundant data types. In some cases, Numpy data types can be converted to Pandas data types. After all, the Pandas library is developed on the basis of Numpy.
Introducing actual data for analysis
The data type is something that you may not care much about until you get the wrong result, so in An example of actual data analysis is introduced here to deepen understanding.
import numpy as np import pandas as pd data = pd.read_csv('data.csv', encoding='gbk') #因为数据中含有中文数据 data
The data is loaded. If you want to perform some operations on the data now, such as Add the corresponding items in columns 2016 and 2017.
data['2016'] data['2017'] #Taken for granted
From As a result, the numerical values are not added as expected. This is because the addition of object types in Pandas is equivalent to the addition of strings in Python.
data.info() #Before processing the data, you should first check the relevant information of the loaded data
- The data type of the customer number is int64 instead of object type
- The data type of the 2016 and 2017 columns is object instead of numeric type (int64, float64)
- The data type of the growth rate and the group to which it belongs should be numeric type instead of object type
- The data types of year, month, and day should be datetime64 type instead of object type
- Use astype() function for forced type conversion
- Custom function for data type conversion
- Use the functions provided by Pandas such as to_numeric(), to_datetime()
The easiest way to convert the data type of the data column is to use the astype() function
data['客户编号'].astype('object') data['客户编号'] = data['客户编号'].astype('object') #对原始数据进行转换并覆盖原始数据列
Look at the above results It sounds very good. Here are a few examples where the astype() function works on column data but fails
data['2017'].astype('float')
data['所属组'].astype('int')
从上面两个例子可以看出,当待转换列中含有不能转换的特殊值时(例子中¥,ErrorValue等)astype()函数将失效。有些时候astype()函数执行成功了也并不一定代表着执行结果符合预期(神坑!)
data['状态'].astype('bool')
乍一看,结果看起来不错,但仔细观察后,会发现一个大问题。那就是所有的值都被替换为True了,但是该列中包含好几个N标志,所以astype()函数在该列也是失效的。
总结一下astype()函数有效的情形:
数据列中的每一个单位都能简单的解释为数字(2, 2.12等)
数据列中的每一个单位都是数值类型且向字符串object类型转换
如果数据中含有缺失值、特殊字符astype()函数可能失效。
使用自定义函数进行数据类型转换
该方法特别适用于待转换数据列的数据较为复杂的情形,可以通过构建一个函数应用于数据列的每一个数据,并将其转换为适合的数据类型。
对于上述数据中的货币,需要将它转换为float类型,因此可以写一个转换函数:
def convert_currency(value): """ 转换字符串数字为float类型 - 移除 ¥ , - 转化为float类型 """ new_value = value.replace(',', '').replace('¥', '') return np.float(new_value)
现在可以使用Pandas的apply函数通过covert_currency函数应用于2016列中的所有数据中。
data['2016'].apply(convert_currency)
该列所有的数据都转换成对应的数值类型了,因此可以对该列数据进行常见的数学操作了。如果利用lambda表达式改写一下代码,可能会比较简洁但是对新手不太友好。
data['2016'].apply(lambda x: x.replace('¥', '').replace(',', '')).astype('float')
当函数需要重复应用于多个列时,个人推荐使用第一种方法,先定义函数还有一个好处就是可以搭配read_csv()函数使用(后面介绍)。
#2016、2017列完整的转换代码 data['2016'] = data['2016'].apply(convert_currency) data['2017'] = data['2017'].apply(convert_currency)
同样的方法运用于增长率,首先构建自定义函数
def convert_percent(value): """ 转换字符串百分数为float类型小数 - 移除 % - 除以100转换为小数 """ new_value = value.replace('%', '') return float(new_value) / 100
使用Pandas的apply函数通过covert_percent函数应用于增长率列中的所有数据中。
data['增长率'].apply(convert_percent)
使用lambda表达式:
data['增长率'].apply(lambda x: x.replace('%', '')).astype('float') / 100
结果都相同:
为了转换状态列,可以使用Numpy中的where函数,把值为Y的映射成True,其他值全部映射成False。
data['状态'] = np.where(data['状态'] == 'Y', True, False)
同样的你也可以使用自定义函数或者使用lambda表达式,这些方法都可以完美的解决这个问题,这里只是多提供一种思路。
利用Pandas的一些辅助函数进行类型转换
Pandas的astype()函数和复杂的自定函数之间有一个中间段,那就是Pandas的一些辅助函数。这些辅助函数对于某些特定数据类型的转换非常有用(如to_numeric()、to_datetime())。所属组数据列中包含一个非数值,用astype()转换出现了错误,然而用to_numeric()函数处理就优雅很多。
pd.to_numeric(data['所属组'], errors='coerce').fillna(0)
可以看到,非数值被替换成0.0了,当然这个填充值是可以选择的,具体文档见
pandas.to_numeric - pandas 0.22.0 documentation
Pandas中的to_datetime()函数可以把单独的year、month、day三列合并成一个单独的时间戳。
pd.to_datetime(data[['day', 'month', 'year']])
完成数据列的替换
data['new_date'] = pd.to_datetime(data[['day', 'month', 'year']]) #新产生的一列数据 data['所属组'] = pd.to_numeric(data['所属组'], errors='coerce').fillna(0)
到这里所有的数据列都转换完毕,最终的数据显示:
在读取数据时就对数据类型进行转换,一步到位
data2 = pd.read_csv("data.csv", converters={ '客户编号': str, '2016': convert_currency, '2017': convert_currency, '增长率': convert_percent, '所属组': lambda x: pd.to_numeric(x, errors='coerce'), '状态': lambda x: np.where(x == "Y", True, False) }, encoding='gbk')
在这里也体现了使用自定义函数比lambda表达式要方便很多。(大部分情况下lambda还是很简洁的,笔者自己也很喜欢使用)
Summary
The first step in operating a data set is to ensure that the correct data type is set, and then the data can be analyzed and visualized For other operations, Pandas provides many very convenient functions. With these functions, it will be very convenient to analyze the data.
Related recommendations:
pandas implements selecting rows at a specific index
##
The above is the detailed content of Some tips for implementing data type conversion in Pandas. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics



Practical tips for converting full-width English letters into half-width forms. In modern life, we often come into contact with English letters, and we often need to input English letters when using computers, mobile phones and other devices. However, sometimes we encounter full-width English letters, and we need to use the half-width form. So, how to convert full-width English letters to half-width form? Here are some practical tips for you. First of all, full-width English letters and numbers refer to characters that occupy a full-width position in the input method, while half-width English letters and numbers occupy a full-width position.

In this article, we will show you how to convert OpenDocumentTextDocument (ODT) files to Microsoft Word (Docx, DOC, etc.). Format. How to Convert ODT to Word in Windows 11/10 Here is how you can convert ODT documents to DOC or DOCX format on Windows PC: Convert ODT to Word using WordPad or Word The first method we are going to show you Is to use WordPad or MicrosoftWord to convert ODT to Word. Here are the steps to achieve this: First, open the WordPad app using the Start menu. Now, go to

Golang time conversion: How to convert timestamp to string In Golang, time operation is one of the very common operations. Sometimes we need to convert the timestamp into a string for easy display or storage. This article will introduce how to use Golang to convert timestamps to strings and provide specific code examples. 1. Conversion of timestamps and strings In Golang, timestamps are usually expressed in the form of integer numbers, which represent the number of seconds from January 1, 1970 to the current time. The string is

This article will introduce in detail how to convert months in PHP to English months, and give specific code examples. In PHP development, sometimes we need to convert digital months to English months, which is very practical in some date processing or data display scenarios. The implementation principles, specific code examples and precautions will be explained in detail below. 1. Implementation principle In PHP, you can convert digital months into English months by using the DateTime class and format method. Date

QQ Music allows everyone to enjoy watching movies and relieve boredom. You can use this software every day to easily satisfy your needs. A large number of high-quality songs are available for everyone to listen to. You can also download and save them. The next time you listen to them, you don’t need an Internet connection. The songs downloaded here are not in MP3 format and cannot be used on other platforms. After the membership songs expire, there is no way to listen to them again. Therefore, many friends want to convert the songs into MP3 format. Here, the editor explains You provide methods so that everyone can use them! 1. Open QQ Music on your computer, click the [Main Menu] button in the upper right corner, click [Audio Transcoding], select the [Add Song] option, and add the songs that need to be converted; 2. After adding the songs, click to select Convert to [mp3]

How to convert full-width English letters into half-width letters In daily life and work, sometimes we encounter situations where we need to convert full-width English letters into half-width letters, such as when entering computer passwords, editing documents, or designing layouts. Full-width English letters and numbers refer to characters with the same width as Chinese characters, while half-width English letters refer to characters with a narrower width. In actual operation, we need to master some simple methods to convert full-width English letters into half-width letters so that we can process text and numbers more conveniently. 1. Full-width English letters and half-width English letters

Simple pandas installation tutorial: Detailed guidance on how to install pandas on different operating systems, specific code examples are required. As the demand for data processing and analysis continues to increase, pandas has become one of the preferred tools for many data scientists and analysts. pandas is a powerful data processing and analysis library that can easily process and analyze large amounts of structured data. This article will detail how to install pandas on different operating systems and provide specific code examples. Install on Windows operating system

PHP Tutorial: How to Convert Int Type to String In PHP, converting integer data to string is a common operation. This tutorial will introduce how to use PHP's built-in functions to convert the int type to a string, while providing specific code examples. Use cast: In PHP, you can use cast to convert integer data into a string. This method is very simple. You only need to add (string) before the integer data to convert it into a string. Below is a simple sample code
