Home > Technology peripherals > AI > body text

The physical principles that inspire modern artificial intelligence art, exploring the possibilities of generative artificial intelligence has just begun

王林
Release: 2023-04-12 23:58:01
forward
942 people have browsed it

Let the image generation system DALL·E 2 created by OpenAI draw a picture of "a goldfish sipping Coca-Cola on the beach" and it will spit out a surreal image. The program encountered images of beaches, goldfish, and Coca-Cola during training, but it was unlikely to see images of all three at the same time. However, DALL·E 2 could combine these concepts into something that might have made Dalí proud.

DALL·E 2 is a generative model - a system that attempts to use training data to generate new things that rival the data in quality and diversity. This is one of the most difficult problems in machine learning, and getting to this point has been a tough journey.

The first important image generation model used an artificial intelligence method called a neural network - a program composed of multiple layers of computational units called artificial neurons . But even as their image quality got better, the models proved unreliable and difficult to train. Meanwhile, a powerful generative model—created by a postdoctoral researcher with a passion for physics— lay dormant until two graduate students made a technological breakthrough that brought the beast back to life.

DALL·E 2 is such a beast. The key insights that make DALL·E 2’s images possible, as well as those of its competitors Stable Diffusion and Imagen, come from the world of physics. The systems that underpin them are called diffusion models and are heavily inspired by non-equilibrium thermodynamics, which governs phenomena such as fluid and gas diffusion. “There are a lot of techniques originally invented by physicists that are now very important in machine learning,” said Yang Song, a machine learning researcher at OpenAI.

The power of these models shocked the industry and users. “This is an exciting time for generative models,” said Anima Anandkumar, a computer scientist at the California Institute of Technology and senior director of machine learning research at Nvidia.

While the realistic images created by diffusion models sometimes perpetuate social and cultural biases, she said, “We have shown that generative models are useful for downstream tasks, [which] improve predictions Fairness of artificial intelligence models.」

High probability

To understand how to create data for images, let’s start with just two phases Start with a simple image consisting of adjacent grayscale pixels. We can fully describe this image with two values ​​based on the shade of each pixel (from 0 for full black to 255 for full white). You can use these two values ​​to plot the image as a point in 2D space.

If we plot multiple images as points, clustering may occur - some images and their corresponding pixel values ​​appear more frequently than others. Now imagine that there is a curved surface above the plane, with the height of the surface corresponding to the density of the clusters. This surface plots a probability distribution. You are most likely to find a single data point below the highest part of the surface, and rarely below the lowest part of the surface.

The physical principles that inspire modern artificial intelligence art, exploring the possibilities of generative artificial intelligence has just begun

DALL·E 2 created these images of "goldfish sipping Coca-Cola on the beach". This program, created by OpenAI, may have never encountered similar images, but can still generate them on its own.

Now you can use this probability distribution to generate new images. All you need to do is randomly generate new data points, while adhering to the constraints of generating more possible data more often - a process called "sampling" the distribution. Each new point is a new image.

The same analysis applies to more realistic grayscale photos, such as one million pixels each. Only now, instead of two axes, drawing each image requires a million. The probability distribution for such an image would be some complex million-plus-one-dimensional surface. If you sample this distribution, you will produce a million pixel values. Print these pixels on a piece of paper and the image will most likely look like a photo from the original dataset.

The challenge of generative modeling is to learn this complex probability distribution for some set of images that make up the training data. The distribution is useful partly because it captures a broad range of information about the data, and partly because researchers can combine probability distributions from different types of data, such as text and images, to compose ultra-realistic outputs, such as a goldfish sipping on a beach Drink Coca-Cola. "You can mix and match different concepts... to create completely new scenarios that have never been seen in the training data," Anandkumar said.

In 2014, a model called a generative adversarial network (GAN) became the first to generate realistic images. "It's so exciting," Anandkumar said. But GANs are difficult to train: they may not learn the full probability distribution, and may only generate images from a subset of the distribution. For example, a GAN trained on images of various animals might only generate images of dogs.

Machine learning requires a more powerful model. Jascha Sohl-Dickstein, whose work is inspired by physics, will provide an answer.

The physical principles that inspire modern artificial intelligence art, exploring the possibilities of generative artificial intelligence has just begun

Jascha Sohl-Dickstein.

Excited Spot

Before and after the invention of GAN, Sohl-Dickstein was a postdoc at Stanford University, studying generative models, Also interested in non-equilibrium thermodynamics. This branch of physics studies systems that are not in thermal equilibrium—those that exchange matter and energy internally and with their environment.

An illustrative example is a drop of blue ink spreading through a container of water. At first, it forms a black spot in one place. At this point, if you want to calculate the probability of finding ink molecules in some small volume of the container, you need a probability distribution that clearly models the initial state before the ink starts to spread. But this distribution is complex, making it difficult to sample from it.

Eventually, however, the ink spreads throughout the water, turning the water a light blue. This allows for a simpler, more uniform probability distribution of molecules described by simple mathematical expressions. Nonequilibrium thermodynamics describes the probability distribution at each step in the diffusion process. Crucially, each step is reversible - with small enough steps, you can go back from a simple distribution to a complex distribution.

The physical principles that inspire modern artificial intelligence art, exploring the possibilities of generative artificial intelligence has just begun

Jascha Sohl-Dickstein created a new generative modeling approach based on diffusion principles. ——Asako Miyakawa

Sohl-Dickstein developed generative modeling algorithms using diffusion principles. The idea is simple: The algorithm first converts the complex images in the training data set into simple noise—similar to changing from a drop of ink to a diffuse light blue of water—and then teaches the system how to reverse the process, converting the noise into for images.

Here's how it works. First, the algorithm obtains images from the training set. As before, assuming that each of the million pixels has some value, we can plot the image as a point in a million-dimensional space. The algorithm adds some noise to each pixel at each time step, equivalent to the spread of ink after a small time step. As this process continues, the pixel values ​​become less and less related to their values ​​in the original image, and the pixels look more like a simple noise distribution. (The algorithm also nudges each pixel value every time step a little towards the origin, which is the zero value on all these axes. This nudge prevents the pixel values ​​from becoming too large for the computer to handle easily.)

Doing this for all images in the dataset, the initial complex distribution of points in a million-dimensional space (which cannot be easily described and sampled) becomes a simple, normal distribution around the origin point.

Sohl-Dickstein said: "Transformation sequence very slowly turns your data distribution into a big ball of noise." This "forward process" gives you a sample that can be easily sampled Distribution.

Next comes the machine learning part: feed the neural network the noisy images obtained from the forward pass and train it to predict less noisy images that appeared one step earlier. It makes mistakes at first, so you adjust the parameters of the network to make it do better. Ultimately, neural networks can reliably convert noisy images representing samples from simple distributions all the way to images representing samples from complex distributions.

The trained network is a mature generative model. Now you don't even need the original image to do the forward pass: you have a complete mathematical description of the simple distribution, so you can sample directly from it. The neural network can turn this sample—which is essentially just static—into a final image that resembles the images in the training data set.

Sohl-Dickstein recalls the first output of his diffusion model. "You squint and say, 'I think that colored blob looks like a truck,'" he said. "I spent many months staring at different pixel patterns, trying to see a structure that I liked, [and this is more organized than I've ever gotten before.] I'm super excited."

Looking ahead

Sohl-Dickstein published his diffusion model algorithm in 2015, but it still lags far behind the capabilities of GANs. While the diffusion model can sample the entire distribution and never spit out just a subset of the image, the image looks worse and the process is too slow. "I don't think it was exciting at the time," Sohl-Dickstein said.

The physical principles that inspire modern artificial intelligence art, exploring the possibilities of generative artificial intelligence has just begun

Paper address: ​https://doi.org/10.48550/arXiv.1503.03585​

It took two students who knew neither Sohl-Dickstein nor each other to connect the dots from the original work to modern diffusion models such as DALL·E 2. The first was Song, then a doctoral student at Stanford University. In 2019, he and his mentor published a new method for building generative models that does not estimate probability distributions of data (high-dimensional surfaces). Instead, it estimates the gradient of the distribution (think of it as the slope of a high-dimensional surface).

The physical principles that inspire modern artificial intelligence art, exploring the possibilities of generative artificial intelligence has just begun

Yang Song helped propose a new technique for generating images by training a network to efficiently interpret noisy images.

Song found that if he first perturbed each image in the training dataset with increasing noise levels and then had his neural network predict the original image using the gradient of the distribution, it worked To denoise it, his technique has the best effect. Once trained, his neural network can draw noisy images from a simple distribution and gradually convert them back into images representative of the training data set. The image quality is great, but his machine learning model is very slow to sample. And he did it without knowing anything about Sohl-Dickstein's work. "I didn't know anything about diffusion models," Song said. "After our 2019 paper was published, I received an email from Jascha. He pointed out to me that [our models] were very closely related."

2020 A second student saw these connections and realized that Song's work could improve Sohl-Dickstein's diffusion model. Jonathan Ho recently completed his PhD research in generative modeling at the University of California, Berkeley, but is still continuing his research. "I think this is the most mathematically beautiful subdiscipline of machine learning," he said.

Ho redesigned and updated Sohl-Dickstein's diffusion model using some of Song's ideas and other advances in the field of neural networks. “I knew that in order to get the community’s attention, I needed the model to generate beautiful samples,” he said. "I was convinced it was the most important thing I could do at that time."

His intuition was correct. Ho and colleagues announced this new and improved diffusion model in a 2020 paper titled "Denoising Probabilistic Diffusion Models." It quickly became such a landmark that researchers now refer to it simply as DDPM. On an image quality benchmark that compares the distribution of generated images to the distribution of training images, these models matched or exceeded all competing generative models, including GANs. It didn't take long for big companies to take notice. Today, DALL·E 2, Stable Diffusion, Imagen, and other commercial models use some variation of DDPM.

The physical principles that inspire modern artificial intelligence art, exploring the possibilities of generative artificial intelligence has just begun

Jonathan Ho and colleagues combined the methods of Sohl-Dickstein and Song to enable modern diffusion models such as DALL· E2.

Modern diffusion models also have a key element: large language models (LLMs), such as GPT-3. These are generative models trained on Internet text to learn probability distributions over words rather than images. In 2021, Ho (now a research scientist at a stealth company) and his colleague Tim Salimans at Google Research and other groups elsewhere showed how to combine information from LLM and image-generating diffusion models using text (e.g., " Goldfish Sipping Coca-Cola on the Beach") to guide the diffusion process and thus image generation. This "guided diffusion" process is behind the success of text-to-image models such as DALL·E 2.

"They far exceeded my wildest expectations," Ho said. "I'm not going to pretend I saw it all." Images of its peers are still far from perfect. Large language models can reflect cultural and social biases, such as racism and sexism, in the text they generate. That's because they're trained on texts lifted from the internet, often containing racist and sexist language. LLMs that learn probability distributions on such texts are fraught with the same biases. Diffusion models are also trained on uncurated images taken from the internet, which may contain similarly biased data. It’s no wonder that combining an LL.M. with today’s communication models sometimes produces images that reflect social ills.

Anandkumar has personal experience. She was shocked when she tried generating a stylized avatar of herself using an application based on diffusion models. "So [many] images are highly sexualized," she said, "and what it presents to men is not." She's not alone.

These biases can be reduced by sorting and filtering the data (an extremely difficult task given the sheer size of the dataset) or by examining the input cues and outputs of these models . "Of course, there's no substitute for careful and extensive security testing" of a model, Ho said. “This is an important challenge for the field.”

Despite these concerns, Anandkumar still believes in the power of generative modeling. “I really like Richard Feynman’s quote: ‘What I can’t create, I don’t understand,’” she says. The increased understanding allows her team to develop generative models that, for example, generate synthetic training data for underrepresented classes for prediction tasks, such as darker skin tones for facial recognition, helping to improve fairness. Generative models can also give us insights into how our brains process noisy inputs, or how they evoke mental images and consider future actions. Building more complex models could give AI similar capabilities.

Anandkumar said: "I think we are just beginning to explore the possibilities of generative artificial intelligence."

The above is the detailed content of The physical principles that inspire modern artificial intelligence art, exploring the possibilities of generative artificial intelligence has just begun. For more information, please follow other related articles on the PHP Chinese website!

Related labels:
source:51cto.com
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template