The traditional GAN can be interpreted after modification, and ensures the interpretability of the convolution kernel and the authenticity of the generated images.-AI-php.cn

The traditional GAN can be interpreted after modification, and ensures the interpretability of the convolution kernel and the authenticity of the generated images.

Paper address: https://www.aaai.org/AAAI22Papers/AAAI-7931.LiC.pdf
Author units: Institute of Computing Technology, Chinese Academy of Sciences, Shanghai Jiao Tong University, Zhijiang Laboratory

Research background and research tasks

Generative Adversarial Network ( GANs) have achieved great success in generating high-resolution images, and research on their interpretability has also attracted widespread attention in recent years.

In this field, how to make GAN learn a decoupled representation is still a major challenge. The so-called decoupled representation of GAN means that each part of the representation only affects specific aspects of the generated image. Previous research on decoupled representation of GANs focused on different perspectives.

For example, in Figure 1 below, Method 1 decouples the structure and style of the image. Method 2 learns the features of local objects in the image. Method 3 learns decoupled features of attributes in images, such as age attributes and gender attributes of face images. However, these studies failed to provide a clear and symbolic representation in GANs for different visual concepts (such as parts of the face such as eyes, nose, and mouth).

The traditional GAN can be interpreted after modification, and ensures the interpretability of the convolution kernel and the authenticity of the generated images.

Figure 1: Visual comparison with other GAN decoupled representation methods

To this end, the researcher proposed a general method to modify traditional GAN into interpretable GAN, which ensures The convolution kernels in the middle layer of the generator can learn decoupled local visual concepts. Specifically, as shown in Figure 2 below, compared with traditional GAN, each convolution kernel in the middle layer of interpretable GAN always represents a specific visual concept when generating different images, and different convolution kernels represent different visions. concept.

The traditional GAN can be interpreted after modification, and ensures the interpretability of the convolution kernel and the authenticity of the generated images.

##Figure 2: Visual comparison of interpretable GAN and traditional GAN encoding representation

Modeling method

The learning of interpretable GAN should meet the following two goals: Interpretability of the convolution kerneland Authenticity of generated images.

Interpretability of convolution kernel: Researchers hope that the convolution kernel of the middle layer can automatically learn meaningful visual concepts without manual annotation of any visual concepts. . Specifically, each convolution kernel should stably generate image regions corresponding to the same visual concept when generating different images. Different convolution kernels should generate image areas corresponding to different visual concepts;
Authenticity of generated images: The interpretable GAN generator can still generate realistic images.

In order to ensure the interpretability of the convolution kernel in the target layer, the researchers noticed that when multiple convolution kernels generate similar areas corresponding to a certain visual concept, They often jointly represent this visual concept.

Therefore, they use a set of convolution kernels to jointly represent a specific visual concept, and use different sets of convolution kernels to represent different visual concepts respectively.

In order to ensure the authenticity of the generated images at the same time, the researchers designed the following loss function to modify the traditional GAN into an interpretable GAN.

Loss of traditional GAN: This loss is used to ensure the authenticity of the generated image;
Convolution kernel division loss: Given a generator, this loss is used to find a way to divide the convolution kernel such that convolution kernels in the same group generate similar image area. Specifically, they use a Gaussian mixture model (GMM) to learn how the convolution kernels are divided to ensure that the feature maps of the convolution kernels in each group have similar neural activations;
Energy model authenticity loss: Given how the target layer kernels are divided, forcing each kernel in the same group to generate the same visual concept may reduce the quality of the generated image. . In order to further ensure the authenticity of the generated images, they use the energy model to output the authenticity probability of the feature map in the target layer, and use maximum likelihood estimation to learn the parameters of the energy model;
Convolution kernel interpretability loss: Given the convolution kernel division method of the target layer, this loss is used to further improve the interpretability of the convolution kernel. Specifically, this loss will cause each convolution kernel in the same group to uniquely generate the same image area, while convolution kernels in different groups are responsible for generating different image areas.

Experimental results

In the experiment, the researchers evaluated their interpretable GAN qualitatively and quantitatively.

Forqualitative analysis, they visualized the feature map of each convolution kernel to evaluate the performance of the convolution kernel on different images. Consistency of visual concepts represented. As shown in Figure 3 below, in interpretable GAN, each convolution kernel always generates image areas corresponding to the same visual concept when generating different images, while different convolution kernels generate image areas corresponding to different visual concepts.

The traditional GAN can be interpreted after modification, and ensures the interpretability of the convolution kernel and the authenticity of the generated images.

Figure 3: Visualization of feature maps in interpretable GAN

In the experiment, the difference between the group center of each group of convolution kernels and the receptive fields between the convolution kernels was also compared, as shown in Figure 4(a) below. Figure 4(b) shows the proportion of the number of convolution kernels corresponding to different visual concepts in interpretable GAN. Figure 4(c) shows that when the number of convolution kernel groups selected for division is different, the more groups, the more detailed the visual concepts learned by the interpretable GAN.

The traditional GAN can be interpreted after modification, and ensures the interpretability of the convolution kernel and the authenticity of the generated images.

Figure 4: Qualitative evaluation of interpretable GAN

Interpretable GAN alsosupports modifying specific visual concepts on the generated image. For example, the interaction of specific visual concepts between images can be achieved by exchanging the corresponding feature maps in the interpretable layer, that is, local/global face swapping is completed.

Figure 5 below gives the results of swapping mouth, hair and nose between pairs of images. The last column gives the difference between the modified image and the original image. This result shows that the researchers' method only modified the local visual concept without changing other irrelevant areas.

The traditional GAN can be interpreted after modification, and ensures the interpretability of the convolution kernel and the authenticity of the generated images.

Figure 5: Exchanging specific visual concepts to generate images

In addition, Figure 6 below also shows the effect of their method when exchanging the entire face .

The traditional GAN can be interpreted after modification, and ensures the interpretability of the convolution kernel and the authenticity of the generated images.

Figure 6: Swap the entire face of the generated image

ForQuantitative analysis , researchers used face verification experiments to evaluate the accuracy of face exchange results. Specifically, given a pair of face images, the face of the original image is replaced with the face of the source image to generate a modified image. Then, test whether the face in the modified image and the face in the source image have the same identity.

Table 1 below shows the accuracy of face verification results## of different methods. Their methods are It is better than other face swapping methods in terms of identity preservation.

The traditional GAN can be interpreted after modification, and ensures the interpretability of the convolution kernel and the authenticity of the generated images.

Table 1: Accuracy evaluation of face-swapping identity

In addition, the locality of the method in modifying specific visual concepts is also evaluated in the experiment. Specifically, the researchers calculated the mean square error (MSE) between the original image and the modified image in RGB space, and used the ratio of the out-of-region MSE and the in-region MSE of a specific visual concept as an experimental index for locality evaluation. .

The results are shown in Table 2 below. The researcher’s modification method has better locality, that is Areas of the image outside of the modified visual concept changed less.

The traditional GAN can be interpreted after modification, and ensures the interpretability of the convolution kernel and the authenticity of the generated images.

Table 2: Locality evaluation of modified visual concepts

For more experimental results, please see the paper.

Summary

This work proposes a general method that can modify traditional GANs into interpretable GANs without any manual annotation of visual concepts. In an interpretable GAN, each convolution kernel in the middle layer of the generator can stably generate the same visual concept when generating different images.

Experiments show that interpretable GAN also enables people to modify specific visual concepts on the generated images, providing a new perspective on the controllable editing method of GAN-generated images.

The above is the detailed content of The traditional GAN can be interpreted after modification, and ensures the interpretability of the convolution kernel and the authenticity of the generated images.. For more information, please follow other related articles on the PHP Chinese website!