With Altair, you can spend more time focusing on the data and its meaning, which I will detail below:
This is a Example of using Altair in JupyterLab to quickly visualize and display a data set:
import altair as alt # load a simple dataset as a pandas DataFrame from vega_datasets import data cars = data.cars() alt.Chart(cars).mark_point().encode( x='Horsepower', y='Miles_per_Gallon', color='Origin', )
One of the unique features of Altair derived from Vega-Lite is the declarative syntax, which not only The visualization function is also interactive. With some modifications to the above example, we can create a linked histogram that is filtered based on the scatter plot selection.
import altair as alt from vega_datasets import data source = data.cars() brush = alt.selection(type='interval') points = alt.Chart(source).mark_point().encode( x='Horsepower', y='Miles_per_Gallon', color=alt.condition(brush, 'Origin', alt.value('lightgray')) ).add_selection( brush ) bars = alt.Chart(source).mark_bar().encode( y='Origin', color='Origin', x='count(Origin)' ).transform_filter( brush ) points & bars
Altair requires the following dependencies:
If you have cloned the repository, run the following command from the root of the repository:
pip install -e .[dev]
If you do not want to clone the repository, you can use the following command to do so Installation:
pip install git+https://github.com/altair-viz/altair
For more details, you can view the github link:
https://github.com/altair-viz/altair
Next, I will introduce Altair in detail How to create visualizations of filtering, grouping, and merging operations that can be used as part of an exploratory data analysis process.
We construct two data frames of simulated data. The first is the restaurant order and the second is the price of the item in the restaurant order.
# import libraries import numpy as np import pandas as pd import altair as alt import random # mock data orders = pd.DataFrame({ "order_id": np.arange(1,101), "item": np.random.randint(1, 50, size=100), "qty": np.random.randint(1, 10, size=100), "tip": (np.random.random(100) * 10).round(2) }) prices = pd.DataFrame({ "item": np.arange(1,51), "price": (np.random.random(50) * 50).round(2) }) order_type = ["lunch", "dinner"] * 50 random.shuffle(order_type) orders["order_type"] = order_type
First, we create a simple diagram to Altair syntax structure.
alt.Chart(orders).mark_circle(size=50).encode( x="qty", y="tip", color="order_type" ).properties( title = "Tip vs Quantity" )
Altair basic syntax four steps:
Consider a situation where we need to create a scatter plot of pirce and tip values, which are in different data frames. One option is to merge the two dataframes and use these two columns in a scatter plot.
Altair provides a more practical method that allows finding columns in other data frames, similar to Pandas's merge function.
alt.Chart(orders).mark_circle(size=50).encode( x="tip", y="price:Q", color="order_type" ).transform_lookup( lookup="item", from_=alt.LookupData(data=prices, key="item", fields=["price"]) ).properties( title = "Price vs Tip" )
The transform_lookup function is similar to Pandas’ merge function. The columns (i.e. rows) used to match the observations are passed to the lookup parameter. The fields parameter is used to select the required columns from another dataframe.
We can also integrate a filter component into the plot, allowing us to plot data points with prices above $10.
alt.Chart(orders).mark_circle(size=50).encode( x="tip", y="price:Q", color="order_type" ).transform_lookup( lookup="item", from_=alt.LookupData(data=prices, key="item", fields=["price"]) ).transform_filter( alt.FieldGTPredicate(field='price', gt=10) ).properties( title = "Price vs Tip" )
The transform_filter function is used for filtering. FieldGTPredicate handles "greater than" conditions.
In addition to filtering and merging, Altair also allows data points to be grouped before plotting. For example, we can create a bar chart that displays the average price of an item for each order type. Additionally, we can do this for items priced under $20.
alt.Chart(orders).mark_bar().encode( y="order_type", x="avg_price:Q" ).transform_lookup( lookup="item", from_=alt.LookupData(data=prices, key="item", fields=["price"]) ).transform_filter( alt.FieldLTPredicate(field='price', lt=20) ).transform_aggregate( avg_price = "mean(price)", groupby = ["order_type"] ).properties( height=200, width=300 )
Let us explain each step in detail:
The difference between Altair and other common visualization libraries is that it can seamlessly integrate data analysis components into visualization, making it a very practical data Explore tools.
Filtering, merging, and grouping are critical to the exploratory data analysis process. Altair allows you to perform all these operations when creating data visualizations. In this sense, Altair can also be considered a data analysis tool. If you are interested, try it now.
The above is the detailed content of Wonderful! This Python data visualization tool is powerful!. For more information, please follow other related articles on the PHP Chinese website!