OpenOOD update v1.5: Comprehensive and accurate out-of-distribution detection code library and testing platform, supporting online rankings and one-click testing-AI-php.cn

OpenOOD update v1.5: Comprehensive and accurate out-of-distribution detection code library and testing platform, supporting online rankings and one-click testing

PHPz

Jul 03, 2023 pm 04:41 PM

performance quantity openood

Out-of-distribution (OOD) detection is crucial for the reliable operation of open-world intelligent systems, but current object-oriented detection methods suffer from "evaluation inconsistencies" (evaluation inconsistencies).

Previous Work OpenOOD v1 unifies the evaluation of OOD detection, but there are still limitations in scalability and usability.

The development team recently proposed OpenOOD v1.5 again. Compared with the previous version, the new OOD detection method evaluation has been significantly improved in ensuring accuracy, standardization and user-friendliness.

OpenOOD update v1.5: Comprehensive and accurate out-of-distribution detection code library and testing platform, supporting online rankings and one-click testing Picture

Paper: https://arxiv.org/abs/2306.09301

OpenOOD Codebase: https://github.com/Jingkang50/OpenOOD

OpenOOD Leaderboard: https://zjysteven.github.io/OpenOOD/

It is worth noting that OpenOOD v1.5 extends its evaluation capabilities to large-scale datasets such as ImageNet, investigates the important but untapped full-spectrum OOD detection, and introduces new features including online leaderboards and easy-to-use evaluators.

This work also contributes to in-depth analysis and insights from comprehensive experimental results, thereby enriching the knowledge base of OOD detection methods.

With these enhancements, OpenOOD v1.5 aims to drive the progress of OOD research and provide a more powerful and comprehensive evaluation benchmark for OOD detection research.

Research background

For a trained image classifier, a key capability that allows it to work reliably in the open world is Detect unknown, out-of-distribution (OOD) samples.

For example, we used a set of cat and dog photos to train a cat and dog classifier. For in-distribution (ID) samples, that is, cat and dog pictures here, we naturally expect the classifier to accurately identify them into the corresponding categories.

For OOD samples outside the distribution, that is, any pictures other than cats and dogs (such as airplanes, fruits, etc.), we hope that the model can detect that they are unknown, Novel objects/concepts, so they cannot be assigned to any category of cats or dogs within the distribution.

This problem is out-of-distribution detection (OOD detection), which has attracted widespread attention in recent years, and new work is emerging one after another. However, while the field is expanding rapidly, it has become difficult to track and measure the development status of the field due to various reasons.

Cause 1: Inconsistent test OOD data set.

The rapid development of various deep learning tasks is inseparable from a unified test data set (just like CIFAR, ImageNet for image classification, PASCAL VOC, COCO for object detection).

Unfortunately, however, the field of OOD detection has always lacked a unified and widely adopted OOD data set. This leads to the fact that in the figure above, when we look back at the experimental settings of existing work, we will find that the OOD data used is very inconsistent (for example, for CIFAR-10, which is ID data, some work uses MNIST and SVHN as OOD , some works use CIFAR-100, Tiny ImageNet as OOD). Under such circumstances, direct and fair comparisons of all methods face significant difficulties.

Reason 2: Confusing terminology.

In addition to OOD detection, other terms such as "Open-Set Recognition (OSR)" (Open-Set Recognition, OSR) and "Novelty Detection" also often appear in the literature .

They essentially focus on the same problem, with only minor differences in the details of some experimental settings. However, different terminology can lead to unnecessary branches between methods. For example, OOD detection and OSR were once regarded as two independent tasks, and there were very few methods between different branches (although they were solving the same problem). are compared together.

Cause 3: Wrong operation.

In many works, researchers often directly use samples in the OOD test set to adjust parameters or even train models. Such an operation would overestimate the method's OOD detection capability.

The above problems are obviously detrimental to the orderly development of the field. We urgently need a unified benchmark and platform to test and evaluate existing and future OOD detection methods.

OpenOOD came into being under such challenges. Its first version has taken an important step, but it has problems of small scale and usability that need to be improved.

Therefore, in the new version of OpenOOD v1.5, we have further strengthened and upgraded it, trying to create a comprehensive, accurate, and easy-to-use testing platform for the majority of researchers.

In summary, OpenOOD has the following important features and contributions:

1. Huge, modular code base.

This code base understands and modularizes model structure, data preprocessing, post-processing, training, testing, etc. to facilitate reuse and development. Currently, OpenOOD implements nearly 40 state-of-the-art OOD detection methods for image classification tasks.

OpenOOD update v1.5: Comprehensive and accurate out-of-distribution detection code library and testing platform, supporting online rankings and one-click testing Picture

2. An evaluator that can be tested with one click.

As shown in the figure above, with just a few lines of code, OpenOOD's evaluator can give the OOD detection test of the provided classifier and post-processor on the specified ID data set. result.

The corresponding OOD data is determined and provided internally by the evaluator, which ensures the consistency and fairness of the test. The evaluator also supports both standard OOD detection (standard OOD detection) and full-spectrum OOD detection (full-spectrum OOD detection) scenarios (more on this later).

3. Online rankings.

Using OpenOOD, we compared the performance of nearly 40 OOD detection methods on four ID data sets: CIFAR-10, CIFAR-100, ImageNet-200, and ImageNet-1K, and The results were made into a public ranking list. We hope to help everyone understand the most effective and promising methods in the field at any time.

4. New findings from the experimental results.

Based on the comprehensive experimental results of OpenOOD, we provide many new findings in the paper. For example, although it seems to have little to do with OOD detection, data augmentation can actually effectively improve the performance of OOD detection, and this improvement is orthogonal and complementary to the improvement brought by specific OOD detection methods.

In addition, we found that the performance of existing methods in full-spectrum OOD detection is not satisfactory, which will also be an important problem to be solved in the future field.

Problem Description

This section will briefly and popularly describe the goals of standard and full-spectrum OOD detection. For a more detailed and formal description, you are welcome to read our paper.

OpenOOD update v1.5: Comprehensive and accurate out-of-distribution detection code library and testing platform, supporting online rankings and one-click testing Picture

First some background. In the image classification scenario we consider, the in-distribution (ID) data is defined by the corresponding classification task. For example, for the CIFAR-10 classification, the ID distribution corresponds to its 10 semantic categories.

The concept of OOD is formed relative to ID: pictures corresponding to any semantic category other than the ID semantic category and different from the ID category are out-of-distribution OOD images. At the same time, we need to discuss the following two types of distributional shifts.

Semantic Shift: Distribution changes at the deep semantic level, corresponding to the horizontal axis of the above figure. For example, the semantic categories are cats and dogs during training, and the semantic categories are airplanes and fruits during testing.

Covariate Shift: The distribution changes at the surface statistical level (while the semantics remain unchanged), corresponding to the vertical axis of the above figure. For example, during training, there are clean and natural photos of cats and dogs, while during testing, there are noise-added or hand-drawn images of cats and dogs.

With the above background, combined with the above picture, you can better understand the standard and full-spectrum OOD detection.

Standard OOD detection

Objective (1): Train a classifier on the ID distribution so that it can accurately classify ID data . It is assumed here that there is no covariate shift between the test ID data and the training ID data.

Goal (2): Based on the trained classifier, design an OOD detection method so that it can distinguish ID/OOD from any sample. The corresponding thing in the above figure is to distinguish (a) from (c) (d).

Full spectrum OOD detection

Objective (1): Similar to standard OOD detection, but the difference is that covariate shift is considered, that is, regardless of To test whether there is a covariate shift in the ID image compared to the training image, the classifier needs to be accurately classified into the corresponding ID category (for example, the cat and dog classifier should not only accurately classify "clean" cat and dog images, but also be able to generalize to noisy, on blurry cat and dog pictures).

Goal (2): Also consider covariate-shifted ID samples, which need to be distinguished from OOD samples together with normal (no covariate shift) ID samples. Correspond to the distinction between (a) (b) and (c) (d) in the above figure.

Why is full spectrum OOD testing important?

Familiar friends may have discovered that target (1) in full-spectrum OOD detection actually corresponds to another very important research topic-out-of-distribution generalization (OOD generalization) ).

It needs to be clarified that OOD in OOD generalization refers to samples with covariate shift, while OOD in OOD detection refers to samples with semantic shift.

These two kinds of shifts are very common in the real world. However, the existing OOD generalization and standard OOD detection only consider one of them and ignore it. Another kind.

In contrast, full-spectrum OOD detection naturally considers both offsets together in the same scenario, more accurately reflecting our view of an ideal classifier in the open world. performance expectations.

Experimental results and new findings

In version 1.5, OpenOOD has tested nearly 40 methods on 6 benchmark data sets ( 4 for standard OOD detection and 2 for full-spectrum OOD detection) have been tested uniformly and comprehensively.

The methods and data sets implemented are described in the paper, and everyone is welcome to check it out. All experiments can also be reproduced in the OpenOOD code base. Here we discuss directly the findings derived from the comparison results.

OpenOOD update v1.5: Comprehensive and accurate out-of-distribution detection code library and testing platform, supporting online rankings and one-click testing Picture

Discovery 1: There is no single winner.

In the above table, it is not difficult to find that no method can consistently give outstanding performance on all benchmark data sets.

For example, post-hoc inference methods ReAct and ASH perform well on the large data set ImageNet, but have no advantage over other methods on CIFAR.

On the contrary, some training methods that add constraints in training, such as RotPred and LogitNorm, are better than post-processing methods on small data sets, but on ImageNet Not outstanding.

Finding 2: Data augmentations help.

As shown in the table above, although data enhancements are not specifically designed for OOD detection, they can effectively improve the performance of OOD detection. What is even more surprising is that the improvements brought by data augmentation and the improvements brought by specific OOD post-processing methods amplify each other.

Take AugMix as an example here. When it is combined with the simplest MSP post-processor, it reaches 77.49% in ImageNet-1K near-OOD detection rate, which is only lower than the cross-entropy loss without data enhancement (corss- entropy loss) training, the detection rate is 77.38% higher than 1.47%.

However, when AugMix is combined with the more advanced ASH post-processor, the corresponding detection rate is 3.99% higher than the cross-entropy baseline and reaches the highest in our tests of 82.16%. Such results show that the combination of data enhancement and post-processing has great potential to further improve OOD detection capabilities in the future.

Finding 3: Full-spectrum detection poses challenge for current detectors.

It can be clearly seen from the above figure that when the scene switches from standard OOD detection to full-spectrum OOD detection (that is, covariate-shifted ID images are added to the test ID data ), the performance of most methods shows significant degradation (greater than 10% decrease in detection rate).

This means that the current method tends to mark covariate-shifted ID images whose actual semantics have not changed as OOD.

This behavior is contrary to human perception (and also the target of full-spectrum OOD detection): Suppose a human tagger is tagging cat and dog pictures, and at this time show him/her For a noisy, blurry picture of a cat or dog, he/she should still recognize that it is a cat/dog, and that it is in-distribution ID data rather than unknown out-of-distribution OOD data.

Generally speaking, current methods cannot effectively solve full-spectrum OOD detection, and we believe this will be an important issue in the future field.

In addition, there are many findings that are not listed here, such as data enhancement is still effective for full-spectrum OOD detection, etc. Once again, everyone is welcome to read our paper.

Looking forward

We hope that OpenOOD’s code base, testers, rankings, benchmark data sets and detailed test results can bring together various Researchers work together to advance the field. I look forward to everyone using OpenOOD to develop and test OOD detection.

We also welcome any form of contribution to OpenOOD, including but not limited to providing feedback, adding the latest methods to the OpenOOD code base and leaderboards, extending future versions of OpenOOD, etc. .

Reference: https://arxiv.org/abs/2306.09301

The above is the detailed content of OpenOOD update v1.5: Comprehensive and accurate out-of-distribution detection code library and testing platform, supporting online rankings and one-click testing. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)

3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

R.E.P.O. Best Graphic Settings

3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Assassin's Creed Shadows: Seashell Riddle Solution

2 weeks ago By DDD

R.E.P.O. How to Fix Audio if You Can't Hear Anyone

3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

WWE 2K25: How To Unlock Everything In MyRise

3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Where is the login entrance for gmail email?

7467

CakePHP Tutorial

1376

What is the format of the account name of steam

win11 activation key permanent

nyt connections hints and answers

Related knowledge

The difference between vivox100s and x100: performance comparison and function analysis Mar 23, 2024 pm 10:27 PM

Both vivox100s and x100 mobile phones are representative models in vivo's mobile phone product line. They respectively represent vivo's high-end technology level in different time periods. Therefore, the two mobile phones have certain differences in design, performance and functions. This article will conduct a detailed comparison between these two mobile phones in terms of performance comparison and function analysis to help consumers better choose the mobile phone that suits them. First, let’s look at the performance comparison between vivox100s and x100. vivox100s is equipped with the latest

How to show hidden performance overlays in Windows 11 Mar 24, 2024 am 09:40 AM

In this tutorial, we will help you reveal hidden performance overlays in Windows 11. Using Windows 11's Performance Overlay feature, you'll be able to monitor your system resources in real time. You can view real-time CPU usage, disk usage, GPU usage, RAM usage, etc. on your computer screen. This is convenient when you are playing games or using large graphics programs (such as video editors) and need to check how much system performance is affected when using a specific program. While there are some excellent free software available for monitoring system performance, and some built-in tools like Resource Monitor can be used to check system performance, the performance overlay feature also has its advantages. For example, you don't need to leave the program or app you're currently using or

Windows 10 vs. Windows 11 performance comparison: Which one is better? Mar 28, 2024 am 09:00 AM

Windows 10 vs. Windows 11 performance comparison: Which one is better? With the continuous development and advancement of technology, operating systems are constantly updated and upgraded. As one of the world's largest operating system developers, Microsoft's Windows series of operating systems have always attracted much attention from users. In 2021, Microsoft released the Windows 11 operating system, which triggered widespread discussion and attention. So, what is the difference in performance between Windows 10 and Windows 11? Which

Comparing the performance of Win11 and Win10 systems, which one is better? Mar 27, 2024 pm 05:09 PM

The Windows operating system has always been one of the most widely used operating systems on personal computers, and Windows 10 has long been Microsoft's flagship operating system until recently when Microsoft launched the new Windows 11 system. With the launch of Windows 11 system, people have become interested in the performance differences between Windows 10 and Windows 11 systems. Which one is better between the two? First, let’s take a look at W

Kirin 8000 processor competes with Snapdragon series: Who can be king? Mar 25, 2024 am 09:03 AM

In the era of mobile Internet, smartphones have become an indispensable part of people's daily lives. The performance of smartphones often directly determines the quality of user experience. As the "brain" of a smartphone, the performance of the processor is particularly important. In the market, the Qualcomm Snapdragon series has always been a representative of strong performance, stability and reliability, and recently Huawei has also launched its own Kirin 8000 processor, which is said to have excellent performance. For ordinary users, how to choose a mobile phone with strong performance has become a key issue. Today we will

Comparison of PHP and Go languages: big performance difference Mar 26, 2024 am 10:48 AM

PHP and Go are two commonly used programming languages, and they have different characteristics and advantages. Among them, performance difference is an issue that everyone is generally concerned about. This article will compare PHP and Go languages from a performance perspective, and demonstrate their performance differences through specific code examples. First, let us briefly introduce the basic features of PHP and Go language. PHP is a scripting language originally designed for web development. It is easy to learn and use and is widely used in the field of web development. The Go language is a compiled language developed by Google.

The local running performance of the Embedding service exceeds that of OpenAI Text-Embedding-Ada-002, which is so convenient! Apr 15, 2024 am 09:01 AM

Ollama is a super practical tool that allows you to easily run open source models such as Llama2, Mistral, and Gemma locally. In this article, I will introduce how to use Ollama to vectorize text. If you have not installed Ollama locally, you can read this article. In this article we will use the nomic-embed-text[2] model. It is a text encoder that outperforms OpenAI text-embedding-ada-002 and text-embedding-3-small on short context and long context tasks. Start the nomic-embed-text service when you have successfully installed o

Performance comparison of different Java frameworks Jun 05, 2024 pm 07:14 PM

Performance comparison of different Java frameworks: REST API request processing: Vert.x is the best, with a request rate of 2 times SpringBoot and 3 times Dropwizard. Database query: SpringBoot's HibernateORM is better than Vert.x and Dropwizard's ORM. Caching operations: Vert.x's Hazelcast client is superior to SpringBoot and Dropwizard's caching mechanisms. Suitable framework: Choose according to application requirements. Vert.x is suitable for high-performance web services, SpringBoot is suitable for data-intensive applications, and Dropwizard is suitable for microservice architecture.

See all articles