Beihang University’s research team used a diffusion model to “replicate” the Earth?
At any location around the world, the model can generate remote sensing images of multiple resolutions, creating rich and diverse "parallel scenes."
Moreover, complex geographical features such as terrain, climate, and vegetation have all been taken into consideration.
Inspired by Google Earth, Beihang's research team "loaded" satellite remote sensing images of the entire Earth into a deep neural network from an overhead perspective.
Based on such a network, the team built MetaEarth, a global top-down visual generation model.
MetaEarth has 600 million parameters and can generate remote sensing images with multiple resolutions, unbounded and covering any geographical location around the world.
Compared with previous research, building a world-wide basic visual generation model is more challenging. Many difficulties were overcome.
Model capacity is a challenge because the Earth has a wide range of geographical features such as cities, forests, deserts, oceans, glaciers, and snowfields, which need to be understood and represented by the model.
Even the same type of man-made features will show huge differences under different latitudes, climates and cultural environments, which places high demands on the capacity of the generated model.
MetaEarth successfully solved this difficulty and achieved high-resolution, large-scale scene generation in different locations and landforms.
#In addition, achieving the generation of remote sensing images with controllable resolution is also a challenge.
Because in the overhead image imaging process, the display of ground feature features is greatly affected by the resolution. There are obvious differences under different image resolutions. It is difficult to achieve the specified resolution (meter/meter/ Pixels)The ability to accurately generate.
When MetaEarth generates images of different resolutions, it can accurately and reasonably present surface features, and the correlations between different resolutions are also accurately mapped.
Finally, there is the challenge of unbounded image generation - unlike daily natural images, remote sensing images have the characteristics of ultra-large width, and the side length may reach tens of thousands of pixels. Previous methods It is difficult to generate continuous, unbounded images of arbitrary sizes.
But the continuous unbounded scene generated by MetaEarth avoids this defect, and you can see that the image moves very smoothly as the "lens" is translated.
In addition, MetaEarth has strong generalization performance and can generate multi-resolution images in cascade with unknown scenes as conditional input.
For example, if the "Pandora Planet" generated by GPT4-V is input into the model as the initial condition, MetaEarth is still able to generate images with reasonable distribution of ground objects and realistic details.
The verification results on downstream missions show that MetaEarth, as a brand-new data engine, is expected to provide virtual environment and training data support for various downstream missions in the field of earth observation.
During the experiment, the author chose the basic task of remote sensing image classification for verification. The results show that with the assistance of high-quality images generated by MetaEarth, the classification accuracy of downstream tasks has been significantly improved.
The author believes that MetaEarth is expected to provide a realistic virtual environment for unmanned aerial system platforms such as satellites, and can be used in urban planning, environmental monitoring, disaster management, agricultural optimization, etc. Widely used in many fields;
In addition to being a data engine, MetaEarth also has great potential in building generative world models, providing new possibilities for future research. .
So, how does MetaEarth realize it?
MetaEarth is built based on the probabilistic diffusion model and has a parameter scale of more than 600 million.
To support model training, the team collected a large remote sensing image data set, containing images of multiple spatial resolutions covering most regions around the world and their geographic information (latitude, longitude and resolution) .
In this study, the author proposes a resolution-guided self-cascading generation framework.
Under this framework, multi-resolution image generation for a given geographical location can be achieved using only a single model, and Create rich and diverse "parallel scenes" at each level of resolution.
Specifically, this is a codec-structured denoising network that combines low-resolution conditional images and spatial resolution encoding with time-step embedding of the denoising process to predict each time step noise to achieve image generation.
In order to generate unbounded images of any size, the author also designed a memory-efficient sliding window generation method and noise sampling strategy.
This strategy divides the generated image into overlapping image blocks as a condition, and uses a specific noise sampling strategy to generate similar content in the shared area of adjacent image blocks, thereby avoiding splicing gaps.
In addition, this noise sampling strategy also enables the model to consume less video memory resources when generating unbounded images of any size.
The author of this study is from the "Learning, Vision and Remote Sensing Laboratory" of Beihang University (LEarning, VIsion and Remote sensing laboratory, LEVIR Lab), the laboratory is led by Professor Shi Zhenwei, a national outstanding student.
Professor Zou Zhengxia, a former doctoral student of Professor Shi Zhenwei, a postdoctoral fellow at the University of Michigan, and a current member of the laboratory, is the corresponding author of this article.
Paper address:https://www.php.cn/link/31bb2feb402ac789507479daf9713b00
Project homepage:https://www.php.cn/link/a0098fd07db7692267fca4f4169c9ba2
The above is the detailed content of Putting the entire earth into a neural network, the Beihang University team launched a global remote sensing image generation model. For more information, please follow other related articles on the PHP Chinese website!