Problem:
Given a dataframe containing categorical values, the task is to convert these categories into numerical indices. Suppose we have countries as categories like this:
cc | temp US | 37.0 CA | 12.0 US | 35.0 AU | 20.0
Instead of one-hot encodings using get_dummies, the goal is to assign each country an index, such as cc_index = [1,2,1,3].
Solution:
To convert Pandas categories to numerical indices, follow these steps:
Change the data type of the categorical column:
df.cc = pd.Categorical(df.cc)
Create a new column to store the category codes:
df['code'] = df.cc.codes
This will result in a dataframe with the additional code column containing the numerical indices:
cc temp code 0 US 37.0 2 1 CA 12.0 1 2 US 35.0 2 3 AU 20.0 0
Alternatively, you can utilize the astype method to convert the categorical column directly to a categorical column with codes:
df.cc.astype('category').codes
Another option is to use the categorical column as the index of a new dataframe:
df2 = pd.DataFrame(df.temp) df2.index = pd.CategoricalIndex(df.cc)
The above is the detailed content of How to Convert Pandas Categories to Numerical Indices?. For more information, please follow other related articles on the PHP Chinese website!