Label Encoding in ML

王林
Release: 2024-08-23 06:01:08
Original
1216 people have browsed it

Label Encoding is one of the most used techniques in machine learning. It is used to convert the categorial data in numerical form. So, data can be fitted into the model.

Let us understand why we use the Label Encoding. Imagine having the data, containing the essential columns in the form of string. But, you cannot fit this data in the model, because modelling only works on numerical data, what do we do? Here comes the life-saving technique which is evaluated at the preprocessing step when we ready the data for fitting, which is Label Encoding.

We will use the iris dataset from Scikit-Learn library, to understand the workings of Label Encoder. Make sure you have the following libraries installed.

pandas
scikit-learn
Copy after login

For installing as libraries, run the following command:

$ python install -U pandas scikit-learn
Copy after login

Now open Google Colab Notebook, and dive into coding and learning Label Encoder.

Let's Code

  • Start with importing the following libraries:
import pandas as pd
from sklearn import preprocessing
Copy after login
  • Import the iris dataset, and initialize it for usage:
from sklearn.datasets import load_iris
iris = load_iris()
Copy after login
  • Now, we need to select the data that we want Encode, we will be encoding the species names for the irises.
species = iris.target_names
print(species)
Copy after login

Output:

array(['setosa', 'versicolor', 'virginica'], dtype='<U10')
Copy after login
Copy after login
  • Let's instantiate the class LabelEncoder from preprocessing:
label_encoder = preprocessing.LabelEncoder()
Copy after login
  • Now, we are ready to fit the data using the label encoder:
label_encoder.fit(species)
Copy after login

You will output similar to this:

Label Encoding in ML

If you get this output, you have successfully fitted the data. But, the question is how you will find out what values are assigned to each species and in which order.

The order in which Label Encoder fits the data is stored in classes_ attribute. Encoding starts from 0 to data_length-1.

label_encoder.classes_
Copy after login

Output:

array(['setosa', 'versicolor', 'virginica'], dtype='<U10')
Copy after login
Copy after login

The label encoder will automatically sort the data, and start the encoding from the left side. Here:

setosa -> 0
versicolor -> 1
virginica -> 2
Copy after login
  • Now, let's test the fitted data. We will transform the iris species setosa.
label_encoder.transform(['setosa'])
Copy after login

Output: array([0])

Again, if you transform the specie virginica.

label_encoder.transform(['virginica'])
Copy after login

Output: array([2])

You can also input the list of species, such as ["setosa", "virginica"]

Scikit Learn documentation for label encoder >>>

The above is the detailed content of Label Encoding in ML. For more information, please follow other related articles on the PHP Chinese website!

source:dev.to
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template
About us Disclaimer Sitemap
php.cn:Public welfare online PHP training,Help PHP learners grow quickly!