A brief analysis of Python data processing

不言
Release: 2018-05-02 13:46:24
Original
1709 people have browsed it

This article shares with you the relevant content and key explanations about Python data processing. Friends who are interested in this knowledge point can refer to it.

Numpy and Pandas are two frameworks often used in Python data processing. They are both written in C language, so the operation speed is fast. Matplotlib is a Python drawing tool that can draw previously processed data through images. I have only seen the syntax before and have not systematically studied and summarized it. This blog post summarizes the APIs of these three frameworks.

The following is a brief introduction and difference between these three frameworks:

  • Numpy: often used for data generation and some operations

  • Pandas: Built based on Numpy, it is an upgraded version of Numpy

  • Matplotlib: A powerful drawing tool in Python

Numpy

Numpy quick start tutorial can refer to: Numpy tutorial

Numpy properties

ndarray.ndim: Dimension

ndarray.shape: Number of rows and columns, such as (3, 5)

ndarray.size: Number of elements

ndarray. dtype: element type

Numpy creation

array(object, dtype=None): Use Python’s list or tuple to create data

zeors(shape, dtype=float): Create data that is all 0

ones(shape, dtype=None): Create data that is all 1

empty( shape, dtype=float): Create data without initialization

arange([start, ]stop, [step, ]dtype=None): Create fixed-interval data segments

linspace(start, stop, num=50, dtype=None): Create data evenly within a given range

Numpy operation

Add, Subtract: a b, a - b

Multiply: b*2, 10*np.sin(a)

raised to the power: b**2

Judgment: a<35, output an array of True or False

Matrix multiplication: np.dot(A,B) or A.dot(B)

Others: =, -, sin, cos, exp

Numpy index

Array indexing method: A[1, 1]

Slice: A[1, 1:3]

Iteration: for item in A.flat

NumpyOther

reshape (a, newshape): Change the shape of the data, without modifying the original data, and return a new set of data

resize(a, new_shape): Change the shape of the data, without modifying the original data. No data is returned

ravel(a): Will be returned in one dimension

vstack(tup): Merge top and bottom

hstack(tup): Merge left and right

hsplit(ary, indices_or_sections): Split n parts horizontally

vsplit(ary, indices_or_sections): Split n parts vertically

copy(a) : Deep copy

Pandas

Pandas quick start tutorial can refer to: 10 Minutes to pandas

Pandas data structure

Pandas has two data structures: Series and DataFrame.

Series: Index on the left, value on the right. The creation method is as follows:

In [4]: s = pd.Series([1,3,5,np.nan,6,8])
In [5]: s
Out[5]: 
0  1.0
1  3.0
2  5.0
3  NaN
4  6.0
5  8.0
dtype: float64
Copy after login

DataFrame: It is a tabular data structure with both row and column indexes. It can be regarded as composed of Series Big dictionary. The creation method is as follows:

In [6]: dates = pd.date_range(&#39;20130101&#39;, periods=6)

In [7]: dates
Out[7]: 
DatetimeIndex([&#39;2013-01-01&#39;, &#39;2013-01-02&#39;, &#39;2013-01-03&#39;, &#39;2013-01-04&#39;,
        &#39;2013-01-05&#39;, &#39;2013-01-06&#39;],
       dtype=&#39;datetime64[ns]&#39;, freq=&#39;D&#39;)

In [8]: df = pd.DataFrame(np.random.randn(6,4), index=dates, columns=list(&#39;ABCD&#39;))
Copy after login

Pandas view data

index: Index

columns: column index

values: value

head(n=5): return the first n items of data

tail(n= 5): Return the last n items of data

describe(): Print out the number of data, average value and other data

sort_index(axis=1, ascending= False): Sort according to index

sort_values(by='B'): Sort according to index value

Pandas selects data

array Selection method: df['A']

Slice selection method: df[0:3] or df['20130102':'20130104']

According to the tag Select: df.loc['20130102':'20130104',['A','B']]

Select based on position: df.iloc[3:5,0:2]

Mixed selection: df.ix[:3,['A','C']]

Conditional selection: df[df.A > 0]

Pandas handles missing data

Delete rows with missing data: df.dropna(how='any')

Fill in missing data :df.fillna(value=5)

Whether the data value is NaN: pd.isna(df1)

Pandas merged data

pd.concat([df1, df2, df3], axis=0): merge df

pd.merge(left, right, on='key'): merge based on key field

df.append(s, ignore_index=True):Add data

Pandas import and export

df.to_csv('foo.csv' ): Save to csv file

pd.read_csv('foo.csv'): Read from csv file

df.to_excel('foo.xlsx', sheet_name='Sheet1'): Save to excel file

pd.read_excel('foo.xlsx', 'Sheet1', index_col=None, na_values=['NA']): From excel file Read

Matplotlib

Here we only introduce the simplest way to plot:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# 随机生成1000个数据
data = pd.Series(np.random.randn(1000),index=np.arange(1000))
# 为了方便观看效果, 我们累加这个数据
data.cumsum()
# pandas 数据可以直接观看其可视化形式
data.plot()
plt.show()
Copy after login

Related recommendations:

A brief discussion on the configuration file path problem of python log

The above is the detailed content of A brief analysis of Python data processing. For more information, please follow other related articles on the PHP Chinese website!

Related labels:
source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template
About us Disclaimer Sitemap
php.cn:Public welfare online PHP training,Help PHP learners grow quickly!