Recently, AI painting has become popular. There has been a wave of AI painting craze at home and abroad. Various pictures generated by AI painting models are common on various social media. Last month, a game designer's work "Space Opera" created using the AI drawing tool Midjourney won the gold medal in the Colorado State Fair art competition.
Inspired by this, Professor Lu Zhiwu’s team at Renmin University of China innovatively combined the self-developed multi-modal pre-training model Wenlan with the latest image generation technology. Created an AI painting generation model that best understands Chinese traditional culture.
The Wenlan model is a large-scale Chinese multi-modal prediction model led by Professor Wen Jirong, Executive Dean of Hillhouse School of Artificial Intelligence, Renmin University of China, Professor Lu Zhiwu, and Permanent Associate Professor Song Ruihua. Train the model. The Wenlan model has been pre-trained on650 million weakly related Chinese image-text pairs, and has learned unique Chinese semantic understanding capabilities and can well interpret Chinese semantics. Connected with visual information, especially good at reading the unique implicit semantics of Chinese and abstract concepts in pictures.
In June this year, the relevant research results "Towards artificial general intelligence via a multimodal foundation model" were published in Nature Communications.
Paper link: https://www.nature.com/articles/s41467-022- 30761-2The combination of Wenlan and generative models
The research team explores the potential of the Wenlan model and combines it with the latest generative technology The innovative combination of Wenlan's abstract semantic understanding ability and the powerful generation ability of the generative model ensures that the resulting model can excellently interpret the semantics of the input text and generate pictures with corresponding semantics.Team
focuses on exploring the potential of Wenlan in traditional Chinese culture, borrowing the latest generative model architecture, and training on the collected traditional Chinese painting data sets , the resulting model can generate images of the corresponding style based on the input text. The detailed architecture diagram is shown below.
Specifically, the team trained an unconditional generation model on the Chinese painting data set and generated it iteratively. The method uses the Wenlan model to guide the generation process.This method first randomly initializes a noise picture. In each step of generation, the model will adjust the content of the generated image in a direction close to the input text, so that the content of the image generated at each step and the input text tend to be consistent in the latent space of the Wenlan model. This step can be described as:
where x and y represent pictures and text respectively, IE and TE represent Wenlan’s picture encoder and text respectively. Encoder. Through continuous iteration, this model can achieve the function of generating high-quality symbol pictures based on text semantics.
Evaluation results of the Wenlan painting model
Due to the characteristics of the Wenlan model itself, the Wenlan painting model can generate corresponding pictures based on the input ancient Chinese poems. As can be seen from the examples below, the pictures generated by the model are very consistent with the content and artistic conception of ancient poems.At the same time, the team also found that
Wen Lan's painting model even has a unique interpretation of obscure Confucianism, Buddhism and Taoism. In order to better demonstrate the characteristics of Wenlan's painting model in interpreting Confucianism, Buddhism and Taoism, the team selected the most popular AI painting models at home and abroad for comparative analysis, including Dream Stealer, Wenxin, and Disco Diffusion , Midjourney and Stable Diffusion. For Disco Diffusion, Midjourney and Stable Diffusion, the Chinese text needs to be translated by Baidu first. Judging from the generated results in the figure below, Dream Stealer, Disco Diffusion, Midjourney and Stable Diffusion tend to generate some concrete objects in sentences or generate some pictures that are better but have different content. The sentences don't have much to do with the pictures. Wenxin tends to generate pictures with characters, and even directly corresponds to light as lit candles. The Wenlan painting model can better read the meaning of the entire sentence and the Confucian thought contained in it, thereby generating pictures that are more in line with this thought. Secondly, for text input containing Buddhist thoughts, the most popular painting generation models can only capture some of them. Objects are generated in a targeted manner, and some painting models may even misunderstand the ideas in them. As shown in the generated results in the figure below, Wen Xin understood "Those who see the Tao and forget the mountains will be lonely in the world, and those who see the mountains and forget the Tao will also be noisy in the mountains" as Taoist thought (generated the image of a Taoist priest). The Wenlan painting model can well interpret the Buddhist thoughts of the input text and reflect it in the generated pictures. Finally, in terms of Taoist thought, the team selected the three most core sentences in the Tao Te Ching. Compared to Dream Stealer, Disco Diffusion, Midjourney and Stable Diffusion, Wen Xin has a better ability to interpret the Tao Te Ching. But overall, Wenlan's painting model interprets Taoist thought more accurately, and the generated pictures have a more Taoist artistic conception. The Wenlan team combines the recently popular AI painting generation technology with the Chinese multi-modal pre-training model Wenlan to deeply explore the role of the Wenlan model in China The potential of traditional Chinese culture is displayed in the form of pictures through generative models, so that the general public can have a more intuitive understanding of some profound Chinese traditional cultural thoughts. Summary
The above is the detailed content of An AI painting model with a strong understanding of traditional Chinese culture. The paintings are tangible and spiritual, conveying Confucianism, Buddhism and Taoism.. For more information, please follow other related articles on the PHP Chinese website!