Share a popular Python visualization module, easy and quick to get started! !

WBOY
Release: 2023-04-12 11:28:15
forward
1257 people have browsed it

Share a popular Python visualization module, easy and quick to get started! !

What is Altair?

Altair is called a statistical visualization library because it can comprehensively understand, understand and analyze data through classification and aggregation, data transformation, data interaction, graphic composite, etc., and its installation process is also very simple. Simple, execute it directly through the pip command, as follows:

pip install altair
pip install vega_datasets
pip install altair_viewer
Copy after login

If you are using the conda package manager to install the Altair module, the code is as follows:

conda install -c conda-forge altair vega_datasets
Copy after login

First experience with Altair

Let's simply try to draw a histogram. First, create a DataFrame data set. The code is as follows:

df = pd.DataFrame({"brand":["iPhone","Xiaomi","HuaWei","Vivo"],
"profit(B)":[200,55,88,60]})
Copy after login

The next step is the code for drawing the histogram:

import altair as alt
import pandas as pd
import altair_viewer
chart = alt.Chart(df).mark_bar().encode(x="brand:N",y="profit(B):Q")
# 展示数据,调用display()方法
altair_viewer.display(chart,inline=True)
Copy after login

output

Share a popular Python visualization module, easy and quick to get started! !

From the perspective of the entire syntax structure, first use alt.Chart() to specify the data set to be used, then use the instance method mark_*() to draw the chart style, and finally specify the X-axis and the data represented by the Y-axis. You may be curious about what N and Q represent respectively. This is the abbreviation of the variable type. In other words, the Altair module needs to understand the variable types involved in drawing graphics. Only In this way, the drawn graphics will be the effect we expect.

N represents a nominal variable (Nominal). For example, the brands of mobile phones are all proper nouns, while Q represents a numerical variable (Quantitative), which can be divided into discrete data. (discrete) and continuous data (continuous), in addition to time series data, the abbreviation is T and ordinal variables (O), for example, the rating of a merchant during the online shopping process has 1-5 stars. .

Saving the chart

To save the final chart, we can directly call the save() method to save the object as an HTML file. The code is as follows:

chart.save("chart.html")
Copy after login

Also It can be saved as a JSON file, which is very similar from the code point of view.

chart.save("chart.json")
Copy after login

Of course we can also save files in image format, as shown below:

Share a popular Python visualization module, easy and quick to get started! !

Advanced Operations of Altair

We On the basis of the above, further derivation and expansion, for example, we want to draw a horizontal bar chart and exchange data on the X-axis and Y-axis. The code is as follows:

chart = alt.Chart(df).mark_bar().encode(x="profit(B):Q", y="brand:N")
chart.save("chart1.html")
Copy after login

output

Share a popular Python visualization module, easy and quick to get started! !

At the same time, we also try to draw a line chart. The mark_line() method is called and the code is as follows:

## 创建一组新的数据,以日期为行索引值
np.random.seed(29)
value = np.random.randn(365)
data = np.cumsum(value)
date = pd.date_range(start="20220101", end="20221231")
df = pd.DataFrame({"num": data}, index=date)
line_chart = alt.Chart(df.reset_index()).mark_line().encode(x="index:T", y="num:Q")
line_chart.save("chart2.html")
Copy after login

output

Share a popular Python visualization module, easy and quick to get started! !

We can also draw a Gantt chart, which is usually used more in project management. The X-axis adds time and date, while the Y-axis represents the progress of the project. Code As follows:

project = [{"project": "Proj1", "start_time": "2022-01-16", "end_time": "2022-03-20"},
{"project": "Proj2", "start_time": "2022-04-12", "end_time": "2022-11-20"},
......
]
df = alt.Data(values=project)
chart = alt.Chart(df).mark_bar().encode(
 alt.X("start_time:T",
 axis=alt.Axis(format="%x",
 formatType="time",
 tickCount=3),
 scale=alt.Scale(domain=[alt.DateTime(year=2022, month=1, date=1),
 alt.DateTime(year=2022, month=12, date=1)])),
 alt.X2("end_time:T"),
 alt.Y("project:N", axis=alt.Axis(labelAlign="left",
labelFontSize=15,
labelOffset=0,
labelPadding=50)),
 color=alt.Color("project:N", legend=alt.Legend(labelFontSize=12,
symbolOpacity=0.7,
titleFontSize=15)))
chart.save("chart_gantt.html")
Copy after login

output

Share a popular Python visualization module, easy and quick to get started! !

From the picture above, we can see several projects being worked on by the team. The progress of each project is different, of course. Well, the time span of different projects is also different, which is very intuitive when displayed on the chart.

Next, we draw the scatter plot again, calling the mark_circle() method, the code is as follows:

df = data.cars()
## 筛选出地区是“USA”也就是美国的乘用车数据
df_1 = alt.Chart(df).transform_filter(
 alt.datum.Origin == "USA"
)
df = data.cars()
df_1 = alt.Chart(df).transform_filter(
 alt.datum.Origin == "USA"
)
chart = df_1.mark_circle().encode(
 alt.X("Horsepower:Q"),
 alt.Y("Miles_per_Gallon:Q")
)
chart.save("chart_dots.html")
Copy after login

output

Share a popular Python visualization module, easy and quick to get started! !

Of course, we can further optimize it to make the chart more beautiful and add some colors. The code is as follows:

chart = df_1.mark_circle(color=alt.RadialGradient("radial",[alt.GradientStop("white", 0.0),
alt.GradientStop("red", 1.0)]),
 size=160).encode(
 alt.X("Horsepower:Q", scale=alt.Scale(zero=False,padding=20)),
 alt.Y("Miles_per_Gallon:Q", scale=alt.Scale(zero=False,padding=20))
)
Copy after login

output

Share a popular Python visualization module, easy and quick to get started! !

We change the scatter The size of the points. The sizes of different scatter points represent different values. The code is as follows:

chart = df_1.mark_circle(color=alt.RadialGradient("radial",[alt.GradientStop("white", 0.0),
alt.GradientStop("red", 1.0)]),
 size=160).encode(
 alt.X("Horsepower:Q", scale=alt.Scale(zero=False, padding=20)),
 alt.Y("Miles_per_Gallon:Q", scale=alt.Scale(zero=False, padding=20)),
 size="Acceleration:Q"
)
Copy after login

output

Share a popular Python visualization module, easy and quick to get started! !

The above is the detailed content of Share a popular Python visualization module, easy and quick to get started! !. For more information, please follow other related articles on the PHP Chinese website!

Related labels:
source:51cto.com
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template
About us Disclaimer Sitemap
php.cn:Public welfare online PHP training,Help PHP learners grow quickly!