CMU joins forces with Adobe: GAN models usher in the era of pre-training, requiring only 1% of training samples-AI-php.cn

Home

Technology peripherals

CMU joins forces with Adobe: GAN models usher in the era of pre-training, requiring only 1% of training samples

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

May 11, 2023 am 09:28 AM

Model gan train

After entering the pre-training era, the performance of visual recognition models has developed rapidly, but image generation models, such as generative adversarial networks (GAN), seem to have fallen behind.

Usually GAN training is done from scratch in an unsupervised manner, which is time-consuming and labor-intensive. The "knowledge" learned through big data in large-scale pre-training is not used. Isn't it a big loss?

Moreover, the image generation itself needs to be able to capture and simulate complex statistical data in real-world visual phenomena. Otherwise, the generated images will not conform to the laws of the physical world and will be directly identified as "fake" at a glance.

CMU joins forces with Adobe: GAN models usher in the era of pre-training, requiring only 1% of training samples

The pre-trained model provides knowledge and the GAN model provides generation capabilities. The combination of the two is probably a beautiful thing!

The question is, which pre-trained models and how to combine them can improve the generation ability of the GAN model?

Recently, researchers from CMU and Adobe published an article in CVPR 2022, combining the pre-training model with the training of the GAN model through "selection".

CMU joins forces with Adobe: GAN models usher in the era of pre-training, requiring only 1% of training samples

Paper link: https://arxiv.org/abs/2112.09130

Project link: https://github.com/nupurkmr9/vision- aided-gan

Video link: https://www.youtube.com/watch?v=oHdyJNdQ9E4

The training process of the GAN model consists of a discriminator and a generator, where the discriminator The generator is used to learn the relevant statistics that distinguish real samples from generated samples, while the goal of the generator is to make the generated images as consistent as possible with the real distribution.

Ideally, the discriminator should be able to measure the distribution gap between the generated image and the real image.

But when the amount of data is very limited, directly using a large-scale pre-trained model as the discriminator can easily lead to the generator being "ruthlessly crushed" and then "overfitting".

Through experiments on the FFHQ 1k data set, even if the latest differentiable data enhancement method is used, the discriminator will still be overfitted. The training set performance is very strong, but it performs very poorly on the verification set. Difference.

CMU joins forces with Adobe: GAN models usher in the era of pre-training, requiring only 1% of training samples

# Additionally, the discriminator may focus on disguises that are indiscernible to humans but obvious to machines.

In order to balance the capabilities of the discriminator and generator, researchers proposed to assemble the representations of a different set of pre-trained models as the discriminator.

CMU joins forces with Adobe: GAN models usher in the era of pre-training, requiring only 1% of training samples

This method has two advantages:

1. Training a shallow classifier on pre-trained features allows the deep network to adapt to small scale A common method for data sets while reducing overfitting.

That is to say, as long as the parameters of the pre-trained model are fixed, and then a lightweight classification network is added to the top layer, a stable training process can be provided.

For example, from the Ours curve in the above experiment, you can see that the accuracy of the verification set is much improved compared to StyleGAN2-ADA.

2. Some recent studies have proven that deep networks can capture meaningful visual concepts, from low-level visual cues (edges and textures) to high-level concepts (objects and object parts). .

The discriminator based on these features may be more in line with human perception.

And combining multiple pre-trained models can promote the generator to match the real distribution in different, complementary feature spaces.

In order to select the best pre-training network, the researchers first collected multiple sota models to form a "model bank", including VGG-16 for classification and Swin-T for detection and segmentation. wait.

CMU joins forces with Adobe: GAN models usher in the era of pre-training, requiring only 1% of training samples

Then based on the linear segmentation of real and fake images in the feature space, an automatic model search strategy is proposed, and label smoothing and differentiable enhancement techniques are used to further stabilize the model training to reduce overfitting.

Specifically, the union of real training samples and generated images is divided into a training set and a verification set.

For each pre-trained model, train a logical linear discriminator to classify whether the sample is from a real sample or a generated one, and use "negative binary cross-entropy loss" on the validation split to measure the distribution gap, and Return the model with the smallest error.

A lower validation error is associated with higher linear detection accuracy, indicating that these features are useful for distinguishing real samples from generated samples, and using these features can provide more useful feedback to the generator.

Researchers We empirically verified GAN training using 1000 training samples from the FFHQ and LSUN CAT data sets.

CMU joins forces with Adobe: GAN models usher in the era of pre-training, requiring only 1% of training samples

CMU joins forces with Adobe: GAN models usher in the era of pre-training, requiring only 1% of training samples The results show that GAN trained with the pre-trained model has higher linear detection accuracy and, generally speaking, can achieve better FID indicators.

In order to incorporate feedback from multiple ready-made models, the article also explores two model selection and integration strategies

1) K-fixed model selection strategy, selecting the K best ones at the beginning of training Ready-made models and train until convergence;

2) K-progressive model selection strategy, iteratively select and add the best performing and unused model after a fixed number of iterations.

The experimental results show that compared with the K-fixed strategy, the progressive approach has lower computational complexity and is also helpful in selecting pre-trained models to capture differences in data distribution. For example, the first two models selected by the progressive strategy are usually a pair of self-supervised and supervised models.

The experiments in the article are mainly progressive.

The final training algorithm first trains a GAN with a standard adversarial loss.

CMU joins forces with Adobe: GAN models usher in the era of pre-training, requiring only 1% of training samples

CMU joins forces with Adobe: GAN models usher in the era of pre-training, requiring only 1% of training samples Given a baseline generator, linear probing can be used to search for the best pre-trained model and introduce a loss objective function during training.

In the K-progressive strategy, after training for a fixed number of iterations proportional to the number of available real training samples, a new visually auxiliary discriminator is added to the previous stage with the best training set In the snapshot of FID.

During the training process, data augmentation is performed by horizontal flipping, and differentiable augmentation techniques and one-sided label smoothing are used as regularization terms.

It can also be observed that using only off-the-shelf models as discriminators leads to divergence, while the combination of original discriminators and pre-trained models can improve this situation.

The final experiment shows the results when the training samples of the FFHQ, LSUN CAT and LSUN CHURCH data sets vary from 1k to 10k.

CMU joins forces with Adobe: GAN models usher in the era of pre-training, requiring only 1% of training samples

CMU joins forces with Adobe: GAN models usher in the era of pre-training, requiring only 1% of training samples In all settings, FID can achieve significant improvements, proving the effectiveness of this method in limited data scenarios.

In order to qualitatively analyze the differences between this method and StyleGAN2-ADA, according to the quality of samples generated by the two methods, the new method proposed in the article can improve the quality of the worst samples, especially for FFHQ and LSUN CAT

CMU joins forces with Adobe: GAN models usher in the era of pre-training, requiring only 1% of training samples

CMU joins forces with Adobe: GAN models usher in the era of pre-training, requiring only 1% of training samples When we gradually add the next discriminator, we can see that the accuracy of linear detection on the features of the pre-trained model is gradually declining, that is to say Generators are stronger.

CMU joins forces with Adobe: GAN models usher in the era of pre-training, requiring only 1% of training samples

CMU joins forces with Adobe: GAN models usher in the era of pre-training, requiring only 1% of training samples Overall, with only 10,000 training samples, this method performs better on FID on LSUN CAT than training on 1.6 million images The performance of StyleGAN2 is similar.

CMU joins forces with Adobe: GAN models usher in the era of pre-training, requiring only 1% of training samples

CMU joins forces with Adobe: GAN models usher in the era of pre-training, requiring only 1% of training samples On the complete data set, this method improves FID by 1.5 to 2 times on the LSUN cat, church, and horse categories.

CMU joins forces with Adobe: GAN models usher in the era of pre-training, requiring only 1% of training samples

The author Richard Zhang received his PhD from the University of California, Berkeley, and his undergraduate and master's degrees from Cornell University. Main research interests include computer vision, machine learning, deep learning, graphics and image processing, often working with academic researchers through internships or university.

CMU joins forces with Adobe: GAN models usher in the era of pre-training, requiring only 1% of training samples

CMU joins forces with Adobe: GAN models usher in the era of pre-training, requiring only 1% of training samples The author Jun-Yan Zhu is an assistant professor in the School of Robotics in the School of Computer Science at Carnegie Mellon University, and serves in the Department of Computer Science and the Machine Learning Department. ,The main research areas include computer vision, computer graphics, machine learning and computational photography.

Before joining CMU, he was a research scientist at Adobe Research. He graduated from Tsinghua University with a bachelor's degree and a Ph.D. from the University of California, Berkeley, and then worked as a postdoctoral fellow at MIT CSAIL.

CMU joins forces with Adobe: GAN models usher in the era of pre-training, requiring only 1% of training samples

CMU joins forces with Adobe: GAN models usher in the era of pre-training, requiring only 1% of training samples #

The above is the detailed content of CMU joins forces with Adobe: GAN models usher in the era of pre-training, requiring only 1% of training samples. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)

4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

R.E.P.O. Best Graphic Settings

4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Assassin's Creed Shadows: Seashell Riddle Solution

2 weeks ago By DDD

R.E.P.O. How to Fix Audio if You Can't Hear Anyone

4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

WWE 2K25: How To Unlock Everything In MyRise

1 months ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Where is the login entrance for gmail email?

7505

CakePHP Tutorial

1378

What is the format of the account name of steam

win11 activation key permanent

nyt connections hints and answers

Related knowledge

The world's most powerful open source MoE model is here, with Chinese capabilities comparable to GPT-4, and the price is only nearly one percent of GPT-4-Turbo May 07, 2024 pm 04:13 PM

Imagine an artificial intelligence model that not only has the ability to surpass traditional computing, but also achieves more efficient performance at a lower cost. This is not science fiction, DeepSeek-V2[1], the world’s most powerful open source MoE model is here. DeepSeek-V2 is a powerful mixture of experts (MoE) language model with the characteristics of economical training and efficient inference. It consists of 236B parameters, 21B of which are used to activate each marker. Compared with DeepSeek67B, DeepSeek-V2 has stronger performance, while saving 42.5% of training costs, reducing KV cache by 93.3%, and increasing the maximum generation throughput to 5.76 times. DeepSeek is a company exploring general artificial intelligence

AI subverts mathematical research! Fields Medal winner and Chinese-American mathematician led 11 top-ranked papers | Liked by Terence Tao Apr 09, 2024 am 11:52 AM

AI is indeed changing mathematics. Recently, Tao Zhexuan, who has been paying close attention to this issue, forwarded the latest issue of "Bulletin of the American Mathematical Society" (Bulletin of the American Mathematical Society). Focusing on the topic "Will machines change mathematics?", many mathematicians expressed their opinions. The whole process was full of sparks, hardcore and exciting. The author has a strong lineup, including Fields Medal winner Akshay Venkatesh, Chinese mathematician Zheng Lejun, NYU computer scientist Ernest Davis and many other well-known scholars in the industry. The world of AI has changed dramatically. You know, many of these articles were submitted a year ago.

KAN, which replaces MLP, has been extended to convolution by open source projects Jun 01, 2024 pm 10:03 PM

Earlier this month, researchers from MIT and other institutions proposed a very promising alternative to MLP - KAN. KAN outperforms MLP in terms of accuracy and interpretability. And it can outperform MLP running with a larger number of parameters with a very small number of parameters. For example, the authors stated that they used KAN to reproduce DeepMind's results with a smaller network and a higher degree of automation. Specifically, DeepMind's MLP has about 300,000 parameters, while KAN only has about 200 parameters. KAN has a strong mathematical foundation like MLP. MLP is based on the universal approximation theorem, while KAN is based on the Kolmogorov-Arnold representation theorem. As shown in the figure below, KAN has

Hello, electric Atlas! Boston Dynamics robot comes back to life, 180-degree weird moves scare Musk Apr 18, 2024 pm 07:58 PM

Boston Dynamics Atlas officially enters the era of electric robots! Yesterday, the hydraulic Atlas just "tearfully" withdrew from the stage of history. Today, Boston Dynamics announced that the electric Atlas is on the job. It seems that in the field of commercial humanoid robots, Boston Dynamics is determined to compete with Tesla. After the new video was released, it had already been viewed by more than one million people in just ten hours. The old people leave and new roles appear. This is a historical necessity. There is no doubt that this year is the explosive year of humanoid robots. Netizens commented: The advancement of robots has made this year's opening ceremony look like a human, and the degree of freedom is far greater than that of humans. But is this really not a horror movie? At the beginning of the video, Atlas is lying calmly on the ground, seemingly on his back. What follows is jaw-dropping

Kuaishou version of Sora 'Ke Ling' is open for testing: generates over 120s video, understands physics better, and can accurately model complex movements Jun 11, 2024 am 09:51 AM

What? Is Zootopia brought into reality by domestic AI? Exposed together with the video is a new large-scale domestic video generation model called "Keling". Sora uses a similar technical route and combines a number of self-developed technological innovations to produce videos that not only have large and reasonable movements, but also simulate the characteristics of the physical world and have strong conceptual combination capabilities and imagination. According to the data, Keling supports the generation of ultra-long videos of up to 2 minutes at 30fps, with resolutions up to 1080p, and supports multiple aspect ratios. Another important point is that Keling is not a demo or video result demonstration released by the laboratory, but a product-level application launched by Kuaishou, a leading player in the short video field. Moreover, the main focus is to be pragmatic, not to write blank checks, and to go online as soon as it is released. The large model of Ke Ling is already available in Kuaiying.

The vitality of super intelligence awakens! But with the arrival of self-updating AI, mothers no longer have to worry about data bottlenecks Apr 29, 2024 pm 06:55 PM

I cry to death. The world is madly building big models. The data on the Internet is not enough. It is not enough at all. The training model looks like "The Hunger Games", and AI researchers around the world are worrying about how to feed these data voracious eaters. This problem is particularly prominent in multi-modal tasks. At a time when nothing could be done, a start-up team from the Department of Renmin University of China used its own new model to become the first in China to make "model-generated data feed itself" a reality. Moreover, it is a two-pronged approach on the understanding side and the generation side. Both sides can generate high-quality, multi-modal new data and provide data feedback to the model itself. What is a model? Awaker 1.0, a large multi-modal model that just appeared on the Zhongguancun Forum. Who is the team? Sophon engine. Founded by Gao Yizhao, a doctoral student at Renmin University’s Hillhouse School of Artificial Intelligence.

The U.S. Air Force showcases its first AI fighter jet with high profile! The minister personally conducted the test drive without interfering during the whole process, and 100,000 lines of code were tested for 21 times. May 07, 2024 pm 05:00 PM

Recently, the military circle has been overwhelmed by the news: US military fighter jets can now complete fully automatic air combat using AI. Yes, just recently, the US military’s AI fighter jet was made public for the first time and the mystery was unveiled. The full name of this fighter is the Variable Stability Simulator Test Aircraft (VISTA). It was personally flown by the Secretary of the US Air Force to simulate a one-on-one air battle. On May 2, U.S. Air Force Secretary Frank Kendall took off in an X-62AVISTA at Edwards Air Force Base. Note that during the one-hour flight, all flight actions were completed autonomously by AI! Kendall said - "For the past few decades, we have been thinking about the unlimited potential of autonomous air-to-air combat, but it has always seemed out of reach." However now,

FisheyeDetNet: the first target detection algorithm based on fisheye camera Apr 26, 2024 am 11:37 AM

Target detection is a relatively mature problem in autonomous driving systems, among which pedestrian detection is one of the earliest algorithms to be deployed. Very comprehensive research has been carried out in most papers. However, distance perception using fisheye cameras for surround view is relatively less studied. Due to large radial distortion, standard bounding box representation is difficult to implement in fisheye cameras. To alleviate the above description, we explore extended bounding box, ellipse, and general polygon designs into polar/angular representations and define an instance segmentation mIOU metric to analyze these representations. The proposed model fisheyeDetNet with polygonal shape outperforms other models and simultaneously achieves 49.5% mAP on the Valeo fisheye camera dataset for autonomous driving

See all articles