This article mainly introduces the detailed explanation of read_excel in Python 2.7 pandas, which has certain reference value. Now I share it with you. Friends in need can refer to it
Import pandas module:
import pandas as pd
Use import to read the pandas module, and use its abbreviation pd for convenience.
Read the excel file to be processed:
df = pd.read_excel('log.xls')
Read by using the read_excel function Enter the excel file, which needs to be replaced with the path where the excel file is located. After reading, it becomes a pandas DataFrame object. DataFrame is a column-oriented two-dimensional table structure and contains lists and row labels. Operations on excel files are converted into operations on DataFrame. In addition, if an excel contains multiple tables, if you only want to read one of them:
df = pd.read_excel('log.xls', sheetname=1)
Added a parameter sheetname, indicating which number table, counting from 0. What I set above is 1, which is the second table.
After reading, you can first check the header information and the data type of each column:
df.dtypes
The output is as follows:
Member object Unnamed: 1 float64 Unnamed: 2 float64 Unnamed: 3 float64 Unnamed: 4 float64 Unnamed: 5 float64 家内外活动类型 object Unnamed: 7 object activity object dtype: object
Extract the last row of data that appears continuously for each member:
new_df = df.drop_duplicates(subset='Member', keep='last')
The above statement means to remove redundant rows based on the Member field and retain the last row of data in the same row. This will get the data of the last row of each member, and return the filtered DataFrame.
Next, you need to save the processed results as an excel file:
out = pd.ExcelWriter('output.xls') new_df.to_excel(out) out.save()
output.xls is yours You can choose the file name to be saved; then save the contents of the DataFrame to the file, and finally save the file to the system disk.
Next, you will see a new file in the current directory, which can be opened and viewed directly using excel.
Pandas also provides a lot of APIs. You can search the API documentation and find the appropriate function to complete the task according to the specific task.
Attached: A complete example
#coding=utf-8 import pandas as pd # 读入excel文件中的第2个表 df = pd.read_excel('log.xls', sheetname=1) # 查看表的数据类型 print df.dtypes # 查看Member列的数据 print df['Member'] ''' # 新建一列,每一行的值是Member列和activity列相同行值的和 for i in df.index: df['activity_2'][i] = df['Member'][i] + df['activity'][i] ''' # 根据Member字段去除掉多余的行,并且保留相同行的最后一行数据 new_df = df.drop_duplicates(subset='Member', keep='last') # 导出结果 out = pd.ExcelWriter('output.xls') new_df.to_excel(out) out.save()
The above is the detailed content of Detailed explanation of read_excel in Python 2.7 pandas. For more information, please follow other related articles on the PHP Chinese website!