In most cases, NumPy or Pandas will be used to import data, so before starting, execute:
import numpy as np import pandas as pd
Many times you don’t know much about some function methods. At this time, Python provides some help information to quickly use Python objects.
np.info(np.ndarray.dtype)
help(pd.read_csv)
filename = 'demo.txt' file = open(filename, mode='r') # 打开文件进行读取 text = file.read() # 读取文件的内容 print(file.closed) # 检查文件是否关闭 file.close() # 关闭文件 print(text)
Use context manager -- with
with open('demo.txt', 'r') as file: print(file.readline()) # 一行一行读取 print(file.readline()) print(file.readline())
Numpy’s built-in functions process data at the C language level.
Flat file is a file containing records without relative relationship structure. (Excel, CSV and Tab delimiter files are supported)
Files with one data type
The string used to separate values skips the first two lines. Read the type of the resulting array in the first and third columns.
filename = 'mnist.txt' data = np.loadtxt(filename, delimiter=',', skiprows=2, usecols=[0,2], dtype=str)
Two hard requirements:
filename = 'titanic.csv' data = np.genfromtxt(filename, delimiter=',', names=True, dtype=None)
filename = 'demo.csv' data = pd.read_csv(filename, nrows=5,# 要读取的文件的行数 header=None,# 作为列名的行号 sep='t', # 分隔符使用 comment='#',# 分隔注释的字符 na_values=[""]) # 可以识别为NA/NaN的字符串
file = 'demo.xlsx' data = pd.ExcelFile(file) df_sheet2 = data.parse(sheet_name='1960-1966', skiprows=[0], names=['Country', 'AAM: War(2002)']) df_sheet1 = pd.read_excel(data, sheet_name=0, parse_cols=[0], skiprows=[0], names=['Country'])
data.sheet_names
from sas7bdat import SAS7BDAT with SAS7BDAT('demo.sas7bdat') as file: df_sas = file.to_data_frame()
data = pd.read_stata('demo.dta')
import pickle with open('pickled_demo.pkl', 'rb') as file: pickled_data = pickle.load(file) # 下载被打开被读取到的数据
import h5py filename = 'H-H1_LOSC_4_v1-815411200-4096.hdf5' data = h5py.File(filename, 'r')
import scipy.io filename = 'workspace.mat' mat = scipy.io.loadmat(filename)
from sqlalchemy import create_engine engine = create_engine('sqlite://Northwind.sqlite')
table_names = engine.table_names()
con = engine.connect() rs = con.execute("SELECT * FROM Orders") df = pd.DataFrame(rs.fetchall()) df.columns = rs.keys() con.close()
with engine.connect() as con: rs = con.execute("SELECT OrderID FROM Orders") df = pd.DataFrame(rs.fetchmany(size=5)) df.columns = rs.keys()
df = pd.read_sql_query("SELECT * FROM Orders", engine)
data_array.dtype# 数组元素的数据类型 data_array.shape# 阵列尺寸 len(data_array) # 数组的长度
df.head()# 返回DataFrames前几行(默认5行) df.tail()# 返回DataFrames最后几行(默认5行) df.index # 返回DataFrames索引 df.columns # 返回DataFrames列名 df.info()# 返回DataFrames基本信息 data_array = data.values # 将DataFrames转换为NumPy数组
The above is the detailed content of Python's eight data import methods, have you mastered them?. For more information, please follow other related articles on the PHP Chinese website!