Pandas: Convert Categories to Numerical Indices
In Pandas, you can encounter situations where you need to convert categorical data, such as countries, into numerical indices. While pd.get_dummies can convert categories into one-hot encodings, it may not always be the most efficient solution. Here's a step-by-step guide on how to convert categories to numerical indices:
Step 1: Categorize the Column
First, change the type of the column to categorical:
<code class="python">df.cc = pd.Categorical(df.cc)</code>
This converts the countries column into a categorical column, denoted by pd.Categorical(column_name).
Step 2: Create a New Column for Codes
Next, create a new column to store the numerical indices:
<code class="python">df['code'] = df.cc.codes</code>
The codes attribute of the categorical column assigns each category a unique integer index.
Example:
Consider the following DataFrame:
cc temp 0 US 37.0 1 CA 12.0 2 US 35.0 3 AU 20.0
After following the steps above, you will have a new DataFrame:
cc temp code 0 US 37.0 2 1 CA 12.0 1 2 US 35.0 2 3 AU 20.0 0
Additional Options:
The above is the detailed content of How to Convert Categorical Data to Numerical Indices in Pandas?. For more information, please follow other related articles on the PHP Chinese website!