


Why is self-monitoring effective? The 243-page Princeton doctoral thesis 'Understanding Self-supervised Representation Learning' comprehensively explains the three types of methods: contrastive learning, language modeling and self-prediction.
Pre-training has emerged as an alternative and effective paradigm to overcome these shortcomings, where models are first trained using easily available data and then used to solve downstream tasks of interest, with less labeled data than supervised learning Much more.
Pre-training using unlabeled data, i.e. self-supervised learning, is particularly revolutionary and has achieved success in different fields: text, vision, speech, etc.
This raises an interesting and challenging question: Why should pretraining on unlabeled data help seemingly unrelated downstream tasks?
Paper address: https://dataspace.princeton.edu/ handle/88435/dsp01t435gh21h
#This paper presents some work that proposes and establishes a theoretical framework to investigate why self-supervised learning is beneficial for downstream tasks.
The framework is suitable for contrastive learning, autoregressive language modeling and self-prediction based methods. The core idea of this framework is that pre-training helps to learn a low-dimensional representation of the data, which subsequently helps solve the downstream tasks of interest with linear classifiers, requiring less labeled data.
A common topic is to formalize the ideal properties of unlabeled data distributions for building self-supervised learning tasks. With appropriate formalization, it can be shown that approximately minimizing the correct pre-training objective can extract downstream signals implicitly encoded in unlabeled data distributions.
Finally, it is shown that the signal can be decoded from the learned representation using a linear classifier, thus providing a formalization for the transfer of "skills and knowledge" across tasks.
Introduction
In the quest to design intelligent agents and data-driven solutions to problems In the process, the fields of machine learning and artificial intelligence have made tremendous progress in the past decade. With initial successes on challenging supervised learning benchmarks such as ImageNet [Deng et al., 2009], innovations in deep learning subsequently led to models with superhuman performance on many such benchmarks in different domains. Training such task-specific models is certainly impressive and has huge practical value. However, it has an important limitation in requiring large labeled or annotated datasets, which is often expensive. Furthermore, from an intelligence perspective, one hopes to have more general models that, like humans [Ahn and Brewer, 1993], can learn from previous experiences, summarize them into skills or concepts, and utilize these skills or Concepts to solve new tasks with little or no demonstration. After all, babies learn a lot through observation and interaction without explicit supervision. These limitations inspired an alternative paradigm for pretraining.
#The focus of this article is on pre-training using the often large amounts of available unlabeled data. The idea of using unlabeled data has long been a point of interest in machine learning, particularly through unsupervised and semi-supervised learning. A modern adaptation of this using deep learning is often called self-supervised learning (SSL) and has begun to change the landscape of machine learning and artificial intelligence through ideas such as contrastive learning and language modeling. The idea of self-supervised learning is to construct certain tasks using only unlabeled data, and train the model to perform well on the constructed tasks. Such tasks typically require models to encode structural properties of the data by predicting unobserved or hidden parts (or properties) of the input from observed or retained parts [LeCun and Misra, 2021]. Self-supervised learning has shown generality and utility on many downstream tasks of interest, often with better sample efficiency than solving tasks from scratch, bringing us one step closer to the goal of general-purpose agents. Indeed, recently, large language models like GPT-3 [Brown et al., 2020] have demonstrated fascinating “emergent behavior” that occurs at scale, sparking more interest in the idea of self-supervised pretraining .
Although self-supervised learning has been empirically successful and continues to show great promise, there is still a lack of good theoretical understanding of how it works beyond rough intuition. These impressive successes raise interesting questions because it is unclear a priori why a model trained on one task should help on another seemingly unrelated task, i.e. why training on task a should help Task b. While a complete theoretical understanding of SSL (and deep learning in general) is challenging and elusive, understanding this phenomenon at any level of abstraction may help develop more principled algorithms. The research motivation of this article is:
Why training on self-supervised learning tasks (using a large amount of unlabeled data) helps solve data-scarce downstream tasks? How to transfer "knowledge and skills" Formalized?
Although there is a large amount of literature on supervised learning, generalization from SSL tasks→downstream tasks is fundamentally different from generalization from training sets→test sets in supervised learning. For supervised learning for downstream tasks of classification, for example, a model trained on a training set of input-label pairs sampled from an unknown distribution can be directly used for evaluation on an unseen test set sampled from the same distribution. This basic distribution establishes the connection from the training set to the test set. However, the conceptual connection from SSL task→downstream task is less clear because the unlabeled data used in the SSL task has no clear signal about downstream labels. This means that a model pretrained on an SSL task (e.g., predicting a part of the input from the rest) cannot be directly used on downstream tasks (e.g., predicting a class label from the input). Therefore, the transfer of "knowledge and skills" requires an additional training step using some labeled data, ideally less than what is required for supervised learning from scratch. Any theoretical understanding of SSL task → downstream task generalization needs to address these questions: "What is the intrinsic role of unlabeled data? and "How to use pre-trained models for downstream tasks?" This paper targets the downstream tasks of classification, by Make distribution assumptions on unlabeled data and use the idea of representation learning to study these issues:
(a) (Distribution Assumption) The distribution of unlabeled data implicitly contains relevant Information about downstream classification tasks of interest.
(b) (Representation Learning) A model pretrained on an appropriate SSL task can encode that signal through learned representations that are then Downstream classification tasks can be solved using linear classifiers.
Point (a) shows that certain unlabeled structural properties implicitly provide us with hints about subsequent downstream tasks, and self-supervised learning can help learn from data to tease out this signal. Point (b) proposes a simple and empirically effective way to use pre-trained models, leveraging the model’s learned representations. This paper identifies and mathematically quantifies distributional properties of unlabeled data, demonstrating that good representations can be learned for different SSL methods such as contrastive learning, language modeling, and self-prediction. In the next section, we delve into the idea of representation learning and formally explain why self-supervised learning helps downstream tasks.
The above is the detailed content of Why is self-monitoring effective? The 243-page Princeton doctoral thesis 'Understanding Self-supervised Representation Learning' comprehensively explains the three types of methods: contrastive learning, language modeling and self-prediction.. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics



Imagine an artificial intelligence model that not only has the ability to surpass traditional computing, but also achieves more efficient performance at a lower cost. This is not science fiction, DeepSeek-V2[1], the world’s most powerful open source MoE model is here. DeepSeek-V2 is a powerful mixture of experts (MoE) language model with the characteristics of economical training and efficient inference. It consists of 236B parameters, 21B of which are used to activate each marker. Compared with DeepSeek67B, DeepSeek-V2 has stronger performance, while saving 42.5% of training costs, reducing KV cache by 93.3%, and increasing the maximum generation throughput to 5.76 times. DeepSeek is a company exploring general artificial intelligence

AI is indeed changing mathematics. Recently, Tao Zhexuan, who has been paying close attention to this issue, forwarded the latest issue of "Bulletin of the American Mathematical Society" (Bulletin of the American Mathematical Society). Focusing on the topic "Will machines change mathematics?", many mathematicians expressed their opinions. The whole process was full of sparks, hardcore and exciting. The author has a strong lineup, including Fields Medal winner Akshay Venkatesh, Chinese mathematician Zheng Lejun, NYU computer scientist Ernest Davis and many other well-known scholars in the industry. The world of AI has changed dramatically. You know, many of these articles were submitted a year ago.

Written previously, today we discuss how deep learning technology can improve the performance of vision-based SLAM (simultaneous localization and mapping) in complex environments. By combining deep feature extraction and depth matching methods, here we introduce a versatile hybrid visual SLAM system designed to improve adaptation in challenging scenarios such as low-light conditions, dynamic lighting, weakly textured areas, and severe jitter. sex. Our system supports multiple modes, including extended monocular, stereo, monocular-inertial, and stereo-inertial configurations. In addition, it also analyzes how to combine visual SLAM with deep learning methods to inspire other research. Through extensive experiments on public datasets and self-sampled data, we demonstrate the superiority of SL-SLAM in terms of positioning accuracy and tracking robustness.

Boston Dynamics Atlas officially enters the era of electric robots! Yesterday, the hydraulic Atlas just "tearfully" withdrew from the stage of history. Today, Boston Dynamics announced that the electric Atlas is on the job. It seems that in the field of commercial humanoid robots, Boston Dynamics is determined to compete with Tesla. After the new video was released, it had already been viewed by more than one million people in just ten hours. The old people leave and new roles appear. This is a historical necessity. There is no doubt that this year is the explosive year of humanoid robots. Netizens commented: The advancement of robots has made this year's opening ceremony look like a human, and the degree of freedom is far greater than that of humans. But is this really not a horror movie? At the beginning of the video, Atlas is lying calmly on the ground, seemingly on his back. What follows is jaw-dropping

Earlier this month, researchers from MIT and other institutions proposed a very promising alternative to MLP - KAN. KAN outperforms MLP in terms of accuracy and interpretability. And it can outperform MLP running with a larger number of parameters with a very small number of parameters. For example, the authors stated that they used KAN to reproduce DeepMind's results with a smaller network and a higher degree of automation. Specifically, DeepMind's MLP has about 300,000 parameters, while KAN only has about 200 parameters. KAN has a strong mathematical foundation like MLP. MLP is based on the universal approximation theorem, while KAN is based on the Kolmogorov-Arnold representation theorem. As shown in the figure below, KAN has

The performance of JAX, promoted by Google, has surpassed that of Pytorch and TensorFlow in recent benchmark tests, ranking first in 7 indicators. And the test was not done on the TPU with the best JAX performance. Although among developers, Pytorch is still more popular than Tensorflow. But in the future, perhaps more large models will be trained and run based on the JAX platform. Models Recently, the Keras team benchmarked three backends (TensorFlow, JAX, PyTorch) with the native PyTorch implementation and Keras2 with TensorFlow. First, they select a set of mainstream

The latest video of Tesla's robot Optimus is released, and it can already work in the factory. At normal speed, it sorts batteries (Tesla's 4680 batteries) like this: The official also released what it looks like at 20x speed - on a small "workstation", picking and picking and picking: This time it is released One of the highlights of the video is that Optimus completes this work in the factory, completely autonomously, without human intervention throughout the process. And from the perspective of Optimus, it can also pick up and place the crooked battery, focusing on automatic error correction: Regarding Optimus's hand, NVIDIA scientist Jim Fan gave a high evaluation: Optimus's hand is the world's five-fingered robot. One of the most dexterous. Its hands are not only tactile

Target detection is a relatively mature problem in autonomous driving systems, among which pedestrian detection is one of the earliest algorithms to be deployed. Very comprehensive research has been carried out in most papers. However, distance perception using fisheye cameras for surround view is relatively less studied. Due to large radial distortion, standard bounding box representation is difficult to implement in fisheye cameras. To alleviate the above description, we explore extended bounding box, ellipse, and general polygon designs into polar/angular representations and define an instance segmentation mIOU metric to analyze these representations. The proposed model fisheyeDetNet with polygonal shape outperforms other models and simultaneously achieves 49.5% mAP on the Valeo fisheye camera dataset for autonomous driving
