


The physical principles that inspire modern artificial intelligence art, exploring the possibilities of generative artificial intelligence has just begun
Let the image generation system DALL·E 2 created by OpenAI draw a picture of "a goldfish sipping Coca-Cola on the beach" and it will spit out a surreal image. The program encountered images of beaches, goldfish, and Coca-Cola during training, but it was unlikely to see images of all three at the same time. However, DALL·E 2 could combine these concepts into something that might have made Dalí proud.
DALL·E 2 is a generative model - a system that attempts to use training data to generate new things that rival the data in quality and diversity. This is one of the most difficult problems in machine learning, and getting to this point has been a tough journey.
The first important image generation model used an artificial intelligence method called a neural network - a program composed of multiple layers of computational units called artificial neurons . But even as their image quality got better, the models proved unreliable and difficult to train. Meanwhile, a powerful generative model—created by a postdoctoral researcher with a passion for physics— lay dormant until two graduate students made a technological breakthrough that brought the beast back to life.
DALL·E 2 is such a beast. The key insights that make DALL·E 2’s images possible, as well as those of its competitors Stable Diffusion and Imagen, come from the world of physics. The systems that underpin them are called diffusion models and are heavily inspired by non-equilibrium thermodynamics, which governs phenomena such as fluid and gas diffusion. “There are a lot of techniques originally invented by physicists that are now very important in machine learning,” said Yang Song, a machine learning researcher at OpenAI.
The power of these models shocked the industry and users. “This is an exciting time for generative models,” said Anima Anandkumar, a computer scientist at the California Institute of Technology and senior director of machine learning research at Nvidia.
While the realistic images created by diffusion models sometimes perpetuate social and cultural biases, she said, “We have shown that generative models are useful for downstream tasks, [which] improve predictions Fairness of artificial intelligence models.」
High probability
To understand how to create data for images, let’s start with just two phases Start with a simple image consisting of adjacent grayscale pixels. We can fully describe this image with two values based on the shade of each pixel (from 0 for full black to 255 for full white). You can use these two values to plot the image as a point in 2D space.
If we plot multiple images as points, clustering may occur - some images and their corresponding pixel values appear more frequently than others. Now imagine that there is a curved surface above the plane, with the height of the surface corresponding to the density of the clusters. This surface plots a probability distribution. You are most likely to find a single data point below the highest part of the surface, and rarely below the lowest part of the surface.
DALL·E 2 created these images of "goldfish sipping Coca-Cola on the beach". This program, created by OpenAI, may have never encountered similar images, but can still generate them on its own.
Now you can use this probability distribution to generate new images. All you need to do is randomly generate new data points, while adhering to the constraints of generating more possible data more often - a process called "sampling" the distribution. Each new point is a new image.
The same analysis applies to more realistic grayscale photos, such as one million pixels each. Only now, instead of two axes, drawing each image requires a million. The probability distribution for such an image would be some complex million-plus-one-dimensional surface. If you sample this distribution, you will produce a million pixel values. Print these pixels on a piece of paper and the image will most likely look like a photo from the original dataset.
The challenge of generative modeling is to learn this complex probability distribution for some set of images that make up the training data. The distribution is useful partly because it captures a broad range of information about the data, and partly because researchers can combine probability distributions from different types of data, such as text and images, to compose ultra-realistic outputs, such as a goldfish sipping on a beach Drink Coca-Cola. "You can mix and match different concepts... to create completely new scenarios that have never been seen in the training data," Anandkumar said.
In 2014, a model called a generative adversarial network (GAN) became the first to generate realistic images. "It's so exciting," Anandkumar said. But GANs are difficult to train: they may not learn the full probability distribution, and may only generate images from a subset of the distribution. For example, a GAN trained on images of various animals might only generate images of dogs.
Machine learning requires a more powerful model. Jascha Sohl-Dickstein, whose work is inspired by physics, will provide an answer.
Jascha Sohl-Dickstein.
Excited Spot
Before and after the invention of GAN, Sohl-Dickstein was a postdoc at Stanford University, studying generative models, Also interested in non-equilibrium thermodynamics. This branch of physics studies systems that are not in thermal equilibrium—those that exchange matter and energy internally and with their environment.
An illustrative example is a drop of blue ink spreading through a container of water. At first, it forms a black spot in one place. At this point, if you want to calculate the probability of finding ink molecules in some small volume of the container, you need a probability distribution that clearly models the initial state before the ink starts to spread. But this distribution is complex, making it difficult to sample from it.
Eventually, however, the ink spreads throughout the water, turning the water a light blue. This allows for a simpler, more uniform probability distribution of molecules described by simple mathematical expressions. Nonequilibrium thermodynamics describes the probability distribution at each step in the diffusion process. Crucially, each step is reversible - with small enough steps, you can go back from a simple distribution to a complex distribution.
Jascha Sohl-Dickstein created a new generative modeling approach based on diffusion principles. ——Asako Miyakawa
Sohl-Dickstein developed generative modeling algorithms using diffusion principles. The idea is simple: The algorithm first converts the complex images in the training data set into simple noise—similar to changing from a drop of ink to a diffuse light blue of water—and then teaches the system how to reverse the process, converting the noise into for images.
Here's how it works. First, the algorithm obtains images from the training set. As before, assuming that each of the million pixels has some value, we can plot the image as a point in a million-dimensional space. The algorithm adds some noise to each pixel at each time step, equivalent to the spread of ink after a small time step. As this process continues, the pixel values become less and less related to their values in the original image, and the pixels look more like a simple noise distribution. (The algorithm also nudges each pixel value every time step a little towards the origin, which is the zero value on all these axes. This nudge prevents the pixel values from becoming too large for the computer to handle easily.)
Doing this for all images in the dataset, the initial complex distribution of points in a million-dimensional space (which cannot be easily described and sampled) becomes a simple, normal distribution around the origin point.
Sohl-Dickstein said: "Transformation sequence very slowly turns your data distribution into a big ball of noise." This "forward process" gives you a sample that can be easily sampled Distribution.
Next comes the machine learning part: feed the neural network the noisy images obtained from the forward pass and train it to predict less noisy images that appeared one step earlier. It makes mistakes at first, so you adjust the parameters of the network to make it do better. Ultimately, neural networks can reliably convert noisy images representing samples from simple distributions all the way to images representing samples from complex distributions.
The trained network is a mature generative model. Now you don't even need the original image to do the forward pass: you have a complete mathematical description of the simple distribution, so you can sample directly from it. The neural network can turn this sample—which is essentially just static—into a final image that resembles the images in the training data set.
Sohl-Dickstein recalls the first output of his diffusion model. "You squint and say, 'I think that colored blob looks like a truck,'" he said. "I spent many months staring at different pixel patterns, trying to see a structure that I liked, [and this is more organized than I've ever gotten before.] I'm super excited."
Looking ahead
Sohl-Dickstein published his diffusion model algorithm in 2015, but it still lags far behind the capabilities of GANs. While the diffusion model can sample the entire distribution and never spit out just a subset of the image, the image looks worse and the process is too slow. "I don't think it was exciting at the time," Sohl-Dickstein said.
Paper address: https://doi.org/10.48550/arXiv.1503.03585
It took two students who knew neither Sohl-Dickstein nor each other to connect the dots from the original work to modern diffusion models such as DALL·E 2. The first was Song, then a doctoral student at Stanford University. In 2019, he and his mentor published a new method for building generative models that does not estimate probability distributions of data (high-dimensional surfaces). Instead, it estimates the gradient of the distribution (think of it as the slope of a high-dimensional surface).
Yang Song helped propose a new technique for generating images by training a network to efficiently interpret noisy images.
Song found that if he first perturbed each image in the training dataset with increasing noise levels and then had his neural network predict the original image using the gradient of the distribution, it worked To denoise it, his technique has the best effect. Once trained, his neural network can draw noisy images from a simple distribution and gradually convert them back into images representative of the training data set. The image quality is great, but his machine learning model is very slow to sample. And he did it without knowing anything about Sohl-Dickstein's work. "I didn't know anything about diffusion models," Song said. "After our 2019 paper was published, I received an email from Jascha. He pointed out to me that [our models] were very closely related."
2020 A second student saw these connections and realized that Song's work could improve Sohl-Dickstein's diffusion model. Jonathan Ho recently completed his PhD research in generative modeling at the University of California, Berkeley, but is still continuing his research. "I think this is the most mathematically beautiful subdiscipline of machine learning," he said.
Ho redesigned and updated Sohl-Dickstein's diffusion model using some of Song's ideas and other advances in the field of neural networks. “I knew that in order to get the community’s attention, I needed the model to generate beautiful samples,” he said. "I was convinced it was the most important thing I could do at that time."
His intuition was correct. Ho and colleagues announced this new and improved diffusion model in a 2020 paper titled "Denoising Probabilistic Diffusion Models." It quickly became such a landmark that researchers now refer to it simply as DDPM. On an image quality benchmark that compares the distribution of generated images to the distribution of training images, these models matched or exceeded all competing generative models, including GANs. It didn't take long for big companies to take notice. Today, DALL·E 2, Stable Diffusion, Imagen, and other commercial models use some variation of DDPM.
Jonathan Ho and colleagues combined the methods of Sohl-Dickstein and Song to enable modern diffusion models such as DALL· E2.
Modern diffusion models also have a key element: large language models (LLMs), such as GPT-3. These are generative models trained on Internet text to learn probability distributions over words rather than images. In 2021, Ho (now a research scientist at a stealth company) and his colleague Tim Salimans at Google Research and other groups elsewhere showed how to combine information from LLM and image-generating diffusion models using text (e.g., " Goldfish Sipping Coca-Cola on the Beach") to guide the diffusion process and thus image generation. This "guided diffusion" process is behind the success of text-to-image models such as DALL·E 2.
"They far exceeded my wildest expectations," Ho said. "I'm not going to pretend I saw it all." Images of its peers are still far from perfect. Large language models can reflect cultural and social biases, such as racism and sexism, in the text they generate. That's because they're trained on texts lifted from the internet, often containing racist and sexist language. LLMs that learn probability distributions on such texts are fraught with the same biases. Diffusion models are also trained on uncurated images taken from the internet, which may contain similarly biased data. It’s no wonder that combining an LL.M. with today’s communication models sometimes produces images that reflect social ills.
Anandkumar has personal experience. She was shocked when she tried generating a stylized avatar of herself using an application based on diffusion models. "So [many] images are highly sexualized," she said, "and what it presents to men is not." She's not alone.
These biases can be reduced by sorting and filtering the data (an extremely difficult task given the sheer size of the dataset) or by examining the input cues and outputs of these models . "Of course, there's no substitute for careful and extensive security testing" of a model, Ho said. “This is an important challenge for the field.”
Despite these concerns, Anandkumar still believes in the power of generative modeling. “I really like Richard Feynman’s quote: ‘What I can’t create, I don’t understand,’” she says. The increased understanding allows her team to develop generative models that, for example, generate synthetic training data for underrepresented classes for prediction tasks, such as darker skin tones for facial recognition, helping to improve fairness. Generative models can also give us insights into how our brains process noisy inputs, or how they evoke mental images and consider future actions. Building more complex models could give AI similar capabilities.
Anandkumar said: "I think we are just beginning to explore the possibilities of generative artificial intelligence."
The above is the detailed content of The physical principles that inspire modern artificial intelligence art, exploring the possibilities of generative artificial intelligence has just begun. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics



This site reported on June 27 that Jianying is a video editing software developed by FaceMeng Technology, a subsidiary of ByteDance. It relies on the Douyin platform and basically produces short video content for users of the platform. It is compatible with iOS, Android, and Windows. , MacOS and other operating systems. Jianying officially announced the upgrade of its membership system and launched a new SVIP, which includes a variety of AI black technologies, such as intelligent translation, intelligent highlighting, intelligent packaging, digital human synthesis, etc. In terms of price, the monthly fee for clipping SVIP is 79 yuan, the annual fee is 599 yuan (note on this site: equivalent to 49.9 yuan per month), the continuous monthly subscription is 59 yuan per month, and the continuous annual subscription is 499 yuan per year (equivalent to 41.6 yuan per month) . In addition, the cut official also stated that in order to improve the user experience, those who have subscribed to the original VIP

Improve developer productivity, efficiency, and accuracy by incorporating retrieval-enhanced generation and semantic memory into AI coding assistants. Translated from EnhancingAICodingAssistantswithContextUsingRAGandSEM-RAG, author JanakiramMSV. While basic AI programming assistants are naturally helpful, they often fail to provide the most relevant and correct code suggestions because they rely on a general understanding of the software language and the most common patterns of writing software. The code generated by these coding assistants is suitable for solving the problems they are responsible for solving, but often does not conform to the coding standards, conventions and styles of the individual teams. This often results in suggestions that need to be modified or refined in order for the code to be accepted into the application

Large Language Models (LLMs) are trained on huge text databases, where they acquire large amounts of real-world knowledge. This knowledge is embedded into their parameters and can then be used when needed. The knowledge of these models is "reified" at the end of training. At the end of pre-training, the model actually stops learning. Align or fine-tune the model to learn how to leverage this knowledge and respond more naturally to user questions. But sometimes model knowledge is not enough, and although the model can access external content through RAG, it is considered beneficial to adapt the model to new domains through fine-tuning. This fine-tuning is performed using input from human annotators or other LLM creations, where the model encounters additional real-world knowledge and integrates it

To learn more about AIGC, please visit: 51CTOAI.x Community https://www.51cto.com/aigc/Translator|Jingyan Reviewer|Chonglou is different from the traditional question bank that can be seen everywhere on the Internet. These questions It requires thinking outside the box. Large Language Models (LLMs) are increasingly important in the fields of data science, generative artificial intelligence (GenAI), and artificial intelligence. These complex algorithms enhance human skills and drive efficiency and innovation in many industries, becoming the key for companies to remain competitive. LLM has a wide range of applications. It can be used in fields such as natural language processing, text generation, speech recognition and recommendation systems. By learning from large amounts of data, LLM is able to generate text

Machine learning is an important branch of artificial intelligence that gives computers the ability to learn from data and improve their capabilities without being explicitly programmed. Machine learning has a wide range of applications in various fields, from image recognition and natural language processing to recommendation systems and fraud detection, and it is changing the way we live. There are many different methods and theories in the field of machine learning, among which the five most influential methods are called the "Five Schools of Machine Learning". The five major schools are the symbolic school, the connectionist school, the evolutionary school, the Bayesian school and the analogy school. 1. Symbolism, also known as symbolism, emphasizes the use of symbols for logical reasoning and expression of knowledge. This school of thought believes that learning is a process of reverse deduction, through existing

Editor |ScienceAI Question Answering (QA) data set plays a vital role in promoting natural language processing (NLP) research. High-quality QA data sets can not only be used to fine-tune models, but also effectively evaluate the capabilities of large language models (LLM), especially the ability to understand and reason about scientific knowledge. Although there are currently many scientific QA data sets covering medicine, chemistry, biology and other fields, these data sets still have some shortcomings. First, the data form is relatively simple, most of which are multiple-choice questions. They are easy to evaluate, but limit the model's answer selection range and cannot fully test the model's ability to answer scientific questions. In contrast, open-ended Q&A

Editor | KX In the field of drug research and development, accurately and effectively predicting the binding affinity of proteins and ligands is crucial for drug screening and optimization. However, current studies do not take into account the important role of molecular surface information in protein-ligand interactions. Based on this, researchers from Xiamen University proposed a novel multi-modal feature extraction (MFE) framework, which for the first time combines information on protein surface, 3D structure and sequence, and uses a cross-attention mechanism to compare different modalities. feature alignment. Experimental results demonstrate that this method achieves state-of-the-art performance in predicting protein-ligand binding affinities. Furthermore, ablation studies demonstrate the effectiveness and necessity of protein surface information and multimodal feature alignment within this framework. Related research begins with "S

According to news from this site on August 1, SK Hynix released a blog post today (August 1), announcing that it will attend the Global Semiconductor Memory Summit FMS2024 to be held in Santa Clara, California, USA from August 6 to 8, showcasing many new technologies. generation product. Introduction to the Future Memory and Storage Summit (FutureMemoryandStorage), formerly the Flash Memory Summit (FlashMemorySummit) mainly for NAND suppliers, in the context of increasing attention to artificial intelligence technology, this year was renamed the Future Memory and Storage Summit (FutureMemoryandStorage) to invite DRAM and storage vendors and many more players. New product SK hynix launched last year
