Home Backend Development Python Tutorial Evaluating A Machine Learning Classification Model

Evaluating A Machine Learning Classification Model

Sep 07, 2024 pm 02:01 PM

Outline

  • What is the goal of model evaluation?
  • What is the purpose of model evaluation, and what are some common evaluation procedures?
  • What is the usage of classification accuracy, and what are its limitations?
  • How does a confusion matrix describe the performance of a classifier?
  • What metrics can be computed from a confusion matrix?

The goal of model evaluation is to answer the question;

how do I choose between different models?

The process of evaluating a machine learning helps determines how well the model is reliable and effective for its application. This involves assessing different factors such as its performance, metrics and accuracy for predictions or decision making.

No matter what model you choose to use, you need a way to choose between models: different model types, tuning parameters, and features. Also you need a model evaluation procedure to estimate how well a model will generalize to unseen data. Lastly you need an evaluation procedure to pair with your procedure in other to quantify your model performance.

Before we proceed, let's review some of the different model evaluation procedures and how they operate.

Model Evaluation Procedures and How They Operate.

  1. Training and testing on the same data
    • Rewards overly complex models that "overfit" the training data and won't necessarily generalize
  2. Train/test split
    • Split the dataset into two pieces, so that the model can be trained and tested on different data
    • Better estimate of out-of-sample performance, but still a "high variance" estimate
    • Useful due to its speed, simplicity, and flexibility
  3. K-fold cross-validation
    • Systematically create "K" train/test splits and average the results together
    • Even better estimate of out-of-sample performance
    • Runs "K" times slower than train/test split.

From above, we can deduce that:

  • Training and testing on the same data is a classic cause of overfitting in which you build an overly complex model that won't generalize to new data and that is not actually useful.

  • Train_Test_Split provides a much better estimate of out-of-sample performance.

  • K-fold cross-validation does better by systematically K train test splits and averaging the results together.

In summary, train_tests_split is still profitable to cross validation due to its speed and simplicity, and that's what we will use in this tutorial guide.

Model Evaluation Metrics:

You will always need an evaluation metric to go along with your chosen procedure, and your choice of metric depends on the problem you are addressing. For classification problems, you can use classification accuracy. But we will focus on other important classification evaluation metrics in this guide.

Before we learn any new evaluation metrics' Lets review the classification accuracy, and talk about its strength and weaknesses.

Classification accuracy

We've chosen the Pima Indians Diabetes dataset for this tutorial, which includes the health data and diabetes status of 768 patients.

Evaluating A Machine Learning Classification Model

Let's read the data and print the first 5 rows of the data. The label column indicates 1 if the patients has diabetes and 0 if the patients doesn't have diabetes, and we intend to answer the question:

Question: Can we predict the diabetes status of a patient given their health measurements?

We define our features metrics X and response vector Y. We use train_test_split to split X and Y into training and testing set.

Evaluating A Machine Learning Classification Model

Next, we train a logistic regression model on training set. During then fit step, the logreg model object is learning the relationship between the X_train and Y_train. Finally we make a class predictions for the testing sets.

Evaluating A Machine Learning Classification Model

Evaluating A Machine Learning Classification Model

Now , we've made prediction for the testing set, we can calculate the classification accuracy,, which is the simply the percentage of correct predictions.

Evaluating A Machine Learning Classification Model

However, anytime you use classification accuracy as your evaluation metrics, it is important to compare it with Null accuracy, which is the accuracy that could be achieved by always predicting the most frequent class.

Evaluating A Machine Learning Classification Model

Null accuracy answers the question; if my model was to predict the predominant class 100 percent of the time, how often will it be correct? In the scenario above, 32% of the y_test are 1 (ones). In other words, a dumb model that predicts that the patients has diabetes, would be right 68% of the time(which is the zeros).This provides a baseline against which we might want to measure our logistic regression model.

When we compare the Null accuracy of 68% and the model accuracy of 69%, our model doesn't look very good. This demonstrates one weakness of classification accuracy as a model evaluation metric. The classification accuracy doesn't tell us anything about the underlying distribution of the testing test.

In Summary:

  • Classification accuracy is the easiest classification metric to understand
  • But, it does not tell you the underlying distribution of response values
  • And, it does not tell you what "types" of errors your classifier is making.

Let's now look at the confusion matrix.

Confusion matrix

The Confusion matrix is a table that describes the performance of a classification model.
It is useful to help you understand the performance of your classifier, but it is not a model evaluation metric; so you can't tell scikit learn to choose the model with the best confusion matrix. However, there are many metrics that can be calculated from the confusion matrix and those can be directly used to choose between models.

Evaluating A Machine Learning Classification Model

  • Every observation in the testing set is represented in exactly one box
  • It's a 2x2 matrix because there are 2 response classes
  • The format shown here is not universal

Let's explain some of its basic terminologies.

  • True Positives (TP): we correctly predicted that they do have diabetes
  • True Negatives (TN): we correctly predicted that they don't have diabetes
  • False Positives (FP): we incorrectly predicted that they do have diabetes (a "Type I error")
  • False Negatives (FN): we incorrectly predicted that they don't have diabetes (a "Type II error")

Let’s see how we can calculate the metrics

Evaluating A Machine Learning Classification Model

Evaluating A Machine Learning Classification Model

Evaluating A Machine Learning Classification Model

In Conclusion:

  • Confusion matrix gives you a more complete picture of how your classifier is performing
  • Also allows you to compute various classification metrics, and these metrics can guide your model selection

The above is the detailed content of Evaluating A Machine Learning Classification Model. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
1 months ago By 尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. Best Graphic Settings
1 months ago By 尊渡假赌尊渡假赌尊渡假赌
Will R.E.P.O. Have Crossplay?
1 months ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

How to solve the permissions problem encountered when viewing Python version in Linux terminal? How to solve the permissions problem encountered when viewing Python version in Linux terminal? Apr 01, 2025 pm 05:09 PM

Solution to permission issues when viewing Python version in Linux terminal When you try to view Python version in Linux terminal, enter python...

How to efficiently copy the entire column of one DataFrame into another DataFrame with different structures in Python? How to efficiently copy the entire column of one DataFrame into another DataFrame with different structures in Python? Apr 01, 2025 pm 11:15 PM

When using Python's pandas library, how to copy whole columns between two DataFrames with different structures is a common problem. Suppose we have two Dats...

How to teach computer novice programming basics in project and problem-driven methods within 10 hours? How to teach computer novice programming basics in project and problem-driven methods within 10 hours? Apr 02, 2025 am 07:18 AM

How to teach computer novice programming basics within 10 hours? If you only have 10 hours to teach computer novice some programming knowledge, what would you choose to teach...

How to avoid being detected by the browser when using Fiddler Everywhere for man-in-the-middle reading? How to avoid being detected by the browser when using Fiddler Everywhere for man-in-the-middle reading? Apr 02, 2025 am 07:15 AM

How to avoid being detected when using FiddlerEverywhere for man-in-the-middle readings When you use FiddlerEverywhere...

What are regular expressions? What are regular expressions? Mar 20, 2025 pm 06:25 PM

Regular expressions are powerful tools for pattern matching and text manipulation in programming, enhancing efficiency in text processing across various applications.

How does Uvicorn continuously listen for HTTP requests without serving_forever()? How does Uvicorn continuously listen for HTTP requests without serving_forever()? Apr 01, 2025 pm 10:51 PM

How does Uvicorn continuously listen for HTTP requests? Uvicorn is a lightweight web server based on ASGI. One of its core functions is to listen for HTTP requests and proceed...

What are some popular Python libraries and their uses? What are some popular Python libraries and their uses? Mar 21, 2025 pm 06:46 PM

The article discusses popular Python libraries like NumPy, Pandas, Matplotlib, Scikit-learn, TensorFlow, Django, Flask, and Requests, detailing their uses in scientific computing, data analysis, visualization, machine learning, web development, and H

How to dynamically create an object through a string and call its methods in Python? How to dynamically create an object through a string and call its methods in Python? Apr 01, 2025 pm 11:18 PM

In Python, how to dynamically create an object through a string and call its methods? This is a common programming requirement, especially if it needs to be configured or run...

See all articles