Python's eight data import methods, have you mastered them?

WBOY
Release: 2023-04-19 12:52:03
forward
1698 people have browsed it

In most cases, NumPy or Pandas will be used to import data, so before starting, execute:

import numpy as np
import pandas as pd
Copy after login

Two ways to get help

Many times you don’t know much about some function methods. At this time, Python provides some help information to quickly use Python objects.

Use the info method in Numpy.

np.info(np.ndarray.dtype)
Copy after login

Python's eight data import methods, have you mastered them?

Python built-in function

help(pd.read_csv)
Copy after login

Python's eight data import methods, have you mastered them?

1. Text File

1. Plain text file

filename = 'demo.txt'
file = open(filename, mode='r') # 打开文件进行读取
text = file.read() # 读取文件的内容
print(file.closed) # 检查文件是否关闭
file.close() # 关闭文件
print(text)
Copy after login

Use context manager -- with

with open('demo.txt', 'r') as file:
print(file.readline()) # 一行一行读取
print(file.readline())
print(file.readline())
Copy after login

2. Table data: Flat file

Use Numpy to read Flat file

Numpy’s built-in functions process data at the C language level.

Flat file is a file containing records without relative relationship structure. (Excel, CSV and Tab delimiter files are supported)

Files with one data type

The string used to separate values ​​skips the first two lines. Read the type of the resulting array in the first and third columns.

filename = 'mnist.txt'
data = np.loadtxt(filename,
delimiter=',',
skiprows=2,
usecols=[0,2],
dtype=str)
Copy after login

  • Files with mixed data types

Two hard requirements:

  • Skip header Information
  • Distinguish between horizontal and vertical coordinates

filename = 'titanic.csv'
data = np.genfromtxt(filename,
 delimiter=',',
 names=True,
 dtype=None)
Copy after login

Python's eight data import methods, have you mastered them?

##Use Pandas to read Flat files

filename = 'demo.csv' 
data = pd.read_csv(filename, 
 nrows=5,# 要读取的文件的行数
 header=None,# 作为列名的行号
 sep='t', # 分隔符使用
 comment='#',# 分隔注释的字符
 na_values=[""]) # 可以识别为NA/NaN的字符串
Copy after login

2. Excel Spreadsheet

ExcelFile() in Pandas is a very convenient and fast class in pandas for reading excel table files, especially for excel containing multiple sheets. It is very convenient to manipulate files.

file = 'demo.xlsx'
data = pd.ExcelFile(file)
df_sheet2 = data.parse(sheet_name='1960-1966',
 skiprows=[0],
 names=['Country',
'AAM: War(2002)'])
df_sheet1 = pd.read_excel(data,
sheet_name=0,
parse_cols=[0],
skiprows=[0],
names=['Country'])
Copy after login

Use the sheet_names property to get the name of the worksheet to be read.

data.sheet_names
Copy after login

3. SAS file

SAS (Statistical Analysis System) is a modular and integrated large-scale application software system. The file it saves, sas, is a statistical analysis file.

from sas7bdat import SAS7BDAT
with SAS7BDAT('demo.sas7bdat') as file:
df_sas = file.to_data_frame()
Copy after login

4. Stata file

Stata is a complete and integrated statistical software that provides its users with data analysis, data management and professional chart drawing. The saved file is a Stata file with the .dta extension.

data = pd.read_stata('demo.dta')
Copy after login

5. Pickled files

Almost all data types in python (lists, dictionaries, sets, classes, etc.) can be serialized using pickle. Python's pickle module implements basic data sequencing and deserialization. Through the serialization operation of the pickle module, we can save the object information running in the program to a file and store it permanently; through the deserialization operation of the pickle module, we can create the object saved by the last program from the file.

import pickle
with open('pickled_demo.pkl', 'rb') as file:
 pickled_data = pickle.load(file) # 下载被打开被读取到的数据
Copy after login

The corresponding operation is to write the method pickle.dump().

6. HDF5 file

HDF5 file is a common cross-platform data storage file that can store different types of images and digital data and can be transferred on different types of machines. There are also function libraries that uniformly handle this file format.

HDF5 files generally have .h5​ or .hdf5 as the suffix, and special software is required to open the content of the preview file.

import h5py
filename = 'H-H1_LOSC_4_v1-815411200-4096.hdf5'
data = h5py.File(filename, 'r')
Copy after login

7. Matlab file

It is a file with the suffix .mat in which matlab stores the data in its workspace.

import scipy.io
filename = 'workspace.mat'
mat = scipy.io.loadmat(filename)
Copy after login

8. Relational database

from sqlalchemy import create_engine
engine = create_engine('sqlite://Northwind.sqlite')
Copy after login

Use the table_names() method to obtain a list of table names

table_names = engine.table_names()
Copy after login

1. Directly query the relational database

con = engine.connect()
rs = con.execute("SELECT * FROM Orders")
df = pd.DataFrame(rs.fetchall())
df.columns = rs.keys()
con.close()
Copy after login

Use the context manager -- with

with engine.connect() as con:
rs = con.execute("SELECT OrderID FROM Orders")
df = pd.DataFrame(rs.fetchmany(size=5))
df.columns = rs.keys()
Copy after login

2. Use Pandas to query the relationship Type database

df = pd.read_sql_query("SELECT * FROM Orders", engine)
Copy after login

Data exploration

After the data is imported, the data will be initially explored, such as checking the data type, data size, length and other basic information. Here is a brief summary.

1, NumPy Arrays

data_array.dtype# 数组元素的数据类型
data_array.shape# 阵列尺寸
len(data_array) # 数组的长度
Copy after login

2, Pandas DataFrames

df.head()# 返回DataFrames前几行(默认5行)
df.tail()# 返回DataFrames最后几行(默认5行)
df.index # 返回DataFrames索引
df.columns # 返回DataFrames列名
df.info()# 返回DataFrames基本信息
data_array = data.values # 将DataFrames转换为NumPy数组
Copy after login

The above is the detailed content of Python's eight data import methods, have you mastered them?. For more information, please follow other related articles on the PHP Chinese website!

Related labels:
source:51cto.com
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template