Using Plot for Categorical Scatter Plots
In this guide, we aim to address a common issue when creating scatter plots in Python using Pandas and matplotlib. Specifically, we will explore how to assign specific symbols to different categories within the data.
The Problem
Given a Pandas DataFrame with multiple columns, the goal is to create a scatter plot where two variables are plotted along the x and y axes, while a third column determines the symbols used to represent the data points.
The Solution: Using Plot
While scatter can be used for this task, it requires numerical values for the categories, which limits its effectiveness. A better approach is to utilize the plot function for discrete categories.
The following code example demonstrates how to implement this approach:
import matplotlib.pyplot as plt import numpy as np import pandas as pd np.random.seed(1974) # Generate Data num = 20 x, y = np.random.random((2, num)) labels = np.random.choice(['a', 'b', 'c'], num) df = pd.DataFrame(dict(x=x, y=y, label=labels)) groups = df.groupby('label') # Plot fig, ax = plt.subplots() ax.margins(0.05) for name, group in groups: ax.plot(group.x, group.y, marker='o', linestyle='', ms=12, label=name) ax.legend() plt.show()
For a visually appealing result, you can customize the plot using the matplotlib style available in Pandas' plotting module:
plt.rcParams.update(pd.tools.plotting.mpl_stylesheet) colors = pd.tools.plotting._get_standard_colors(len(groups), color_type='random') # ... (the rest of the code remains the same)
This will give you a scatter plot where each category is represented by a distinct color and symbol.
The above is the detailed content of How to Create Categorical Scatter Plots with Distinct Symbols in Python?. For more information, please follow other related articles on the PHP Chinese website!