How to use pandas for data visualization and exploratory analysis
Introduction:
In the process of data analysis, visualization and exploratory analysis are indispensable link. Pandas is a very powerful data analysis library in Python. In addition to data processing functions, it also provides a series of tools for data visualization and exploratory analysis. This article will introduce how to use pandas for data visualization and exploratory analysis, and give specific code examples.
1. Data visualization
1. Line chart
The line chart is a commonly used data visualization method that can be used to show the trend of data changes over time. Drawing a line chart using pandas is very simple, just call the plot method of DataFrame. The following is a sample code:
import pandas as pd # 创建一个DataFrame data = {'日期': ['2020-01-01', '2020-01-02', '2020-01-03', '2020-01-04'], '销售额': [100, 200, 150, 180]} df = pd.DataFrame(data) # 将日期列转换成日期类型 df['日期'] = pd.to_datetime(df['日期']) # 设置日期列为索引 df.set_index('日期', inplace=True) # 绘制折线图 df.plot()
2. Histogram
The histogram is a common visualization method for comparing different categories of data. Similarly, it is very simple to draw a histogram using pandas. You only need to call the plot method of DataFrame and set the kind parameter to 'bar'. The following is a sample code:
import pandas as pd # 创建一个DataFrame data = {'城市': ['北京', '上海', '广州', '深圳'], '人口': [2152, 2424, 1348, 1303]} df = pd.DataFrame(data) # 设置城市列为索引 df.set_index('城市', inplace=True) # 绘制柱状图 df.plot(kind='bar')
3. Scatter plot
Scatter plots are often used to show the correlation between two numerical variables. Pandas also provides the function of drawing scatter plots. The following is a sample code:
import pandas as pd # 创建一个DataFrame data = {'体重': [65, 75, 58, 80, 68], '身高': [175, 180, 160, 190, 170]} df = pd.DataFrame(data) # 绘制散点图 df.plot.scatter(x='身高', y='体重')
2. Exploratory analysis
1. Basic statistical analysis
pandas provides a series of methods for basic statistical analysis, such as mean, median, min, max wait. The following is a sample code:
import pandas as pd # 创建一个DataFrame data = {'姓名': ['张三', '李四', '王五', '赵六'], '年龄': [18, 20, 22, 24], '身高': [170, 175, 180, 185]} df = pd.DataFrame(data) # 输出年龄的平均值、中位数、最小值、最大值等统计量 print('平均年龄:', df['年龄'].mean()) print('年龄中位数:', df['年龄'].median()) print('最小年龄:', df['年龄'].min()) print('最大年龄:', df['年龄'].max())
2. Correlation analysis
Commonly used methods include correlation coefficient and covariance. The following is a sample code:
import pandas as pd # 创建一个DataFrame data = {'体重': [65, 75, 58, 80, 68], '身高': [175, 180, 160, 190, 170]} df = pd.DataFrame(data) # 计算体重和身高的相关系数和协方差 print('相关系数:', df['体重'].corr(df['身高'])) print('协方差:', df['体重'].cov(df['身高']))
3. Missing value processing
pandas provides a series of methods for missing value processing, such as isnull, fillna, dropna, etc. The following is a sample code:
import pandas as pd import numpy as np # 创建一个包含缺失值的DataFrame data = {'姓名': ['张三', '李四', np.nan, '赵六'], '年龄': [18, 20, np.nan, 24]} df = pd.DataFrame(data) # 判断哪些值是缺失值 print(df.isnull()) # 填充缺失值 df.fillna(0, inplace=True) # 删除包含缺失值的行 df.dropna(inplace=True)
This article introduces how to use pandas for data visualization and exploratory analysis, and gives specific code examples. By mastering these skills, you can more flexibly process data, analyze data, and draw meaningful conclusions.
The above is the detailed content of Tips and methods for data visualization and exploratory data analysis using pandas. For more information, please follow other related articles on the PHP Chinese website!