This article shares with you the relevant content and key explanations about Python data processing. Friends who are interested in this knowledge point can refer to it.
Numpy and Pandas are two frameworks often used in Python data processing. They are both written in C language, so the operation speed is fast. Matplotlib is a Python drawing tool that can draw previously processed data through images. I have only seen the syntax before and have not systematically studied and summarized it. This blog post summarizes the APIs of these three frameworks.
The following is a brief introduction and difference between these three frameworks:
Numpy: often used for data generation and some operations
Pandas: Built based on Numpy, it is an upgraded version of Numpy
Matplotlib: A powerful drawing tool in Python
Numpy
Numpy quick start tutorial can refer to: Numpy tutorial
Numpy properties
ndarray.ndim: Dimension
ndarray.shape: Number of rows and columns, such as (3, 5)
ndarray.size: Number of elements
ndarray. dtype: element type
Numpy creation
array(object, dtype=None): Use Python’s list or tuple to create data
zeors(shape, dtype=float): Create data that is all 0
ones(shape, dtype=None): Create data that is all 1
empty( shape, dtype=float): Create data without initialization
arange([start, ]stop, [step, ]dtype=None): Create fixed-interval data segments
linspace(start, stop, num=50, dtype=None): Create data evenly within a given range
Numpy operation
Add, Subtract: a b, a - b
Multiply: b*2, 10*np.sin(a)
raised to the power: b**2
Judgment: a<35, output an array of True or False
Matrix multiplication: np.dot(A,B) or A.dot(B)
Others: =, -, sin, cos, exp
Numpy index
Array indexing method: A[1, 1]
Slice: A[1, 1:3]
Iteration: for item in A.flat
NumpyOther
reshape (a, newshape): Change the shape of the data, without modifying the original data, and return a new set of data
resize(a, new_shape): Change the shape of the data, without modifying the original data. No data is returned
ravel(a): Will be returned in one dimension
vstack(tup): Merge top and bottom
hstack(tup): Merge left and right
hsplit(ary, indices_or_sections): Split n parts horizontally
vsplit(ary, indices_or_sections): Split n parts vertically
copy(a) : Deep copy
Pandas
Pandas quick start tutorial can refer to: 10 Minutes to pandas
Pandas data structure
Pandas has two data structures: Series and DataFrame.
Series: Index on the left, value on the right. The creation method is as follows:
In [4]: s = pd.Series([1,3,5,np.nan,6,8]) In [5]: s Out[5]: 0 1.0 1 3.0 2 5.0 3 NaN 4 6.0 5 8.0 dtype: float64
DataFrame: It is a tabular data structure with both row and column indexes. It can be regarded as composed of Series Big dictionary. The creation method is as follows:
In [6]: dates = pd.date_range('20130101', periods=6) In [7]: dates Out[7]: DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04', '2013-01-05', '2013-01-06'], dtype='datetime64[ns]', freq='D') In [8]: df = pd.DataFrame(np.random.randn(6,4), index=dates, columns=list('ABCD'))
Pandas view data
index: Index
columns: column index
values: value
head(n=5): return the first n items of data
tail(n= 5): Return the last n items of data
describe(): Print out the number of data, average value and other data
sort_index(axis=1, ascending= False): Sort according to index
sort_values(by='B'): Sort according to index value
Pandas selects data
array Selection method: df['A']
Slice selection method: df[0:3] or df['20130102':'20130104']
According to the tag Select: df.loc['20130102':'20130104',['A','B']]
Select based on position: df.iloc[3:5,0:2]
Mixed selection: df.ix[:3,['A','C']]
Conditional selection: df[df.A > 0]
Pandas handles missing data
Delete rows with missing data: df.dropna(how='any')
Fill in missing data :df.fillna(value=5)
Whether the data value is NaN: pd.isna(df1)
Pandas merged data
pd.concat([df1, df2, df3], axis=0): merge df
pd.merge(left, right, on='key'): merge based on key field
df.append(s, ignore_index=True):Add data
Pandas import and export
df.to_csv('foo.csv' ): Save to csv file
pd.read_csv('foo.csv'): Read from csv file
df.to_excel('foo.xlsx', sheet_name='Sheet1'): Save to excel file
pd.read_excel('foo.xlsx', 'Sheet1', index_col=None, na_values=['NA']): From excel file Read
Matplotlib
Here we only introduce the simplest way to plot:
import pandas as pd import numpy as np import matplotlib.pyplot as plt # 随机生成1000个数据 data = pd.Series(np.random.randn(1000),index=np.arange(1000)) # 为了方便观看效果, 我们累加这个数据 data.cumsum() # pandas 数据可以直接观看其可视化形式 data.plot() plt.show()
Related recommendations:
A brief discussion on the configuration file path problem of python log
The above is the detailed content of A brief analysis of Python data processing. For more information, please follow other related articles on the PHP Chinese website!