Home > Technology peripherals > AI > body text

USB: The first semi-supervised classification learning benchmark that unifies visual, language and audio classification tasks

WBOY
Release: 2023-04-13 14:46:03
forward
1218 people have browsed it

Currently, the development of semi-supervised learning is in full swing. However, existing semi-supervised learning benchmarks are mostly limited to computer vision classification tasks, excluding consistent and diverse evaluation of classification tasks such as natural language processing and audio processing. In addition, most semi-supervised papers are published by large institutions, and it is often difficult for academic laboratories to participate in advancing the field due to limitations in computing resources.

To this end, researchers from Microsoft Research Asia and researchers from Westlake University, Tokyo Institute of Technology, Carnegie Mellon University, Max Planck Institute and other institutions proposed Unified SSL Benchmark (USB): the first semi-supervised classification learning benchmark that unifies visual, language and audio classification tasks.

This paper not only introduces more diverse application fields, but also uses a visual pre-training model for the first time to greatly reduce the verification time of semi-supervised algorithms, making semi-supervised research more convenient for researchers. Especially small research groups are more friendly. Relevant papers have been accepted by NeurIPS 2022, the top international academic conference in the field of artificial intelligence.

USB: The first semi-supervised classification learning benchmark that unifies visual, language and audio classification tasks

## Article link: https://arxiv.org/pdf/2208.07204.pdf

Code link: https://github.com/microsoft/Semi-supervised-learning

Supervised learning By building models to fit labeled data, neural network models produce competitive results when trained on large amounts of high-quality labeled data using supervised learning.

For example, according to statistics from the Paperswithcode website, on the million-level data set of ImageNet, traditional supervised learning methods can achieve an accuracy of more than 88%. However, obtaining large amounts of labeled data is often time-consuming and laborious.

In order to alleviate the dependence on labeled data, semi-supervised learning (SSL) is committed to utilizing a large amount of unlabeled data when there is only a small amount of labeled data. to improve the generalization of the model. Semi-supervised learning is also one of the important topics of machine learning. Before deep learning, researchers in this field proposed classic algorithms such as semi-supervised support vector machines, entropy regularization, and collaborative training.

Deep semi-supervised learning

With the rise of deep learning, deep semi-supervised learning algorithms have also made great progress. At the same time, technology companies including Microsoft, Google, and Meta have also recognized the huge potential of semi-supervised learning in practical scenarios.

For example, Google uses noisy student training, a semi-supervised algorithm, to improve its search performance [1]. The most representative semi-supervised algorithms currently use cross-entropy loss for training on labeled data, and consistency regularization on unlabeled data to encourage invariant predictions to input perturbations.

For example, the FixMatch[2] algorithm proposed by Google at NeurIPS 2020 uses augmentation anchoring and fixed thresholding technologies to enhance the model to enhance data with different strengths. Generalizability and reducing the impact of noisy pseudo labels. During training, FixMatch filters unlabeled data below a user-provided/pre-defined threshold.

FlexMatch[3], jointly proposed by Microsoft Research Asia and Tokyo Institute of Technology at NeurIPS 2021, takes into account the different learning difficulties between different categories, so it proposes course pseudo-labels ( curriculum pseudo labeling) technology, different thresholds should be used for different categories.

Specifically, for easy-to-learn categories, the model should set a high threshold to reduce the impact of noisy pseudo-labels; for difficult-to-learn categories, the model should set a low threshold to encourage this category fitting. The learning difficulty evaluation of each class depends on the number of unlabeled data samples falling into that class and above a fixed value.

At the same time, researchers from Microsoft Research Asia also collaborated to propose a unified Pytorch-based semi-supervised method code library TorchSSL[4], which provides deep methods and common data in the field. Sets and benchmark results are uniformly supported.

USB: The first semi-supervised classification learning benchmark that unifies visual, language and audio classification tasksFigure 1: FlexMatch algorithm process

Problems and challenges in the current semi-supervised learning code library

Although the development of semi-supervised learning is in full swing, researchers have noticed that most of the current papers in the semi-supervised direction only focus on computer vision (CV) classification tasks. For other fields, such as natural language processing (NLP) and audio processing (audio), Researchers cannot know whether these algorithms that are effective in CV tasks are still effective in different fields.

In addition, most semi-supervised papers are published by large institutions, and it is often difficult for academic laboratories to participate in promoting the development of this field due to limitations in computing resources. . In general, semi-supervised learning benchmarks currently have the following two problems:

(1) Insufficient diversity. Most of the existing semi-supervised learning benchmarks are limited to CV classification tasks (i.e., CIFAR-10/100, SVHN, STL-10 and ImageNet classification), excluding consistent and diverse evaluation of classification tasks such as NLP, audio, etc., while in NLP The lack of sufficient labeled data in and audio is also a common problem.

(2) Time-consuming and unfriendly to academia. Existing semi-supervised learning benchmarks such as TorchSSL are often time-consuming and environmentally unfriendly as it often requires training deep neural network models from scratch. Specifically, evaluating FixMatch[1] using TorchSSL requires approximately 300 GPU days. Such high training costs make SSL-related research unaffordable for many research laboratories (especially those in academia or small research groups), thus hindering the progress of SSL.

USB: A new benchmark library with diverse tasks and more friendly to researchers

In order to solve the above problems, researchers from Microsoft Research Asia teamed up with Westlake University, Tokyo Researchers from TU, Carnegie Mellon University, Max Planck Institute and other institutions proposed Unified SSL Benchmark (USB), which is the first semi-supervised classification to unify visual, language and audio classification tasks Learning Benchmarks.

Compared with previous semi-supervised learning benchmarks (such as TorchSSL) that only focused on a small number of visual tasks, this benchmark not only introduces more diverse application fields, but also utilizes visual pre-training for the first time. The model (pretrained vision Transformer) greatly reduces the verification time of semi-supervised algorithms (from 7000 GPU hours to 900 GPU hours), making semi-supervised research more friendly to researchers, especially small research groups.

Relevant papers have been accepted by NeurIPS 2022, the top academic conference in the field of international artificial intelligence. (Click "Read the original text" to learn more)

Solution provided by USB

So, how can USB solve the problems of the current semi-supervised benchmarks in one go? ? The researchers mainly made the following improvements:

(1) To enhance task diversity, USB introduced 5 CV data sets, 5 NLP data sets and 5 audio data sets. and provides a diverse and challenging benchmark that enables consistent evaluation of multiple tasks from different domains. Table 1 provides a detailed comparison of tasks and training time between USB and TorchSSL.

USB: The first semi-supervised classification learning benchmark that unifies visual, language and audio classification tasks

Table 1: Task and training time comparison between USB and TorchSSL frameworks

(2) In order to improve training efficiency, researchers introduced pre-trained vision Transformer into SSL instead of training ResNets from scratch. Specifically, the researchers found that using pre-trained models can significantly reduce the number of training iterations without affecting performance (e.g., reducing the number of training iterations for a CV task from 1 million steps to 200,000 steps).

(3) In order to be more friendly to researchers, researchers have implemented 14 SSL algorithms as open source and open sourced a modular code library and related configuration files for researchers to easily reproduce the results in the USB report. To get started quickly, USB also provides detailed documentation and tutorials. In addition, USB also provides the pip package for users to directly call the SSL algorithm. The researchers promise to continue to add new algorithms (such as unbalanced semi-supervised algorithms, etc.) and more challenging data sets to USB in the future. Table 2 shows the algorithms and modules already supported in USB.

USB: The first semi-supervised classification learning benchmark that unifies visual, language and audio classification tasks

Table 2: Supported algorithms and modules in USB

Semi-supervised learning has important research and application value in the future by utilizing large amounts of unlabeled data to train more accurate and robust models. Researchers at Microsoft Research Asia expect that through this USB work, they can help academia and industry make greater progress in the field of semi-supervised learning.

The above is the detailed content of USB: The first semi-supervised classification learning benchmark that unifies visual, language and audio classification tasks. For more information, please follow other related articles on the PHP Chinese website!

Related labels:
source:51cto.com
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template