Putting the entire earth into a neural network, the Beihang University team launched a global remote sensing image generation model-AI-php.cn

Table of Contents

Remote sensing image generation model with global coverage

600 million parameter diffusion model "replicates" the earth

△The overall framework of MetaEarth

Team Profile

Home

Technology peripherals

Putting the entire earth into a neural network, the Beihang University team launched a global remote sensing image generation model

PHPz

Jun 09, 2024 pm 09:56 PM

image Model generate

Beihang University’s research team used a diffusion model to “replicate” the Earth?

At any location around the world, the model can generate remote sensing images of multiple resolutions, creating rich and diverse "parallel scenes."

Moreover, complex geographical features such as terrain, climate, and vegetation have all been taken into consideration.

Putting the entire earth into a neural network, the Beihang University team launched a global remote sensing image generation model

Inspired by Google Earth, Beihang's research team "loaded" satellite remote sensing images of the entire Earth into a deep neural network from an overhead perspective.

Based on such a network, the team built MetaEarth, a global top-down visual generation model.

MetaEarth has 600 million parameters and can generate remote sensing images with multiple resolutions, unbounded and covering any geographical location around the world.

Putting the entire earth into a neural network, the Beihang University team launched a global remote sensing image generation model

Remote sensing image generation model with global coverage

Compared with previous research, building a world-wide basic visual generation model is more challenging. Many difficulties were overcome.

Model capacity is a challenge because the Earth has a wide range of geographical features such as cities, forests, deserts, oceans, glaciers, and snowfields, which need to be understood and represented by the model.

Even the same type of man-made features will show huge differences under different latitudes, climates and cultural environments, which places high demands on the capacity of the generated model.

MetaEarth successfully solved this difficulty and achieved high-resolution, large-scale scene generation in different locations and landforms.

Putting the entire earth into a neural network, the Beihang University team launched a global remote sensing image generation model

#In addition, achieving the generation of remote sensing images with controllable resolution is also a challenge.

Because in the overhead image imaging process, the display of ground feature features is greatly affected by the resolution. There are obvious differences under different image resolutions. It is difficult to achieve the specified resolution (meter/meter/ Pixels)The ability to accurately generate.

When MetaEarth generates images of different resolutions, it can accurately and reasonably present surface features, and the correlations between different resolutions are also accurately mapped.

Putting the entire earth into a neural network, the Beihang University team launched a global remote sensing image generation model

Finally, there is the challenge of unbounded image generation - unlike daily natural images, remote sensing images have the characteristics of ultra-large width, and the side length may reach tens of thousands of pixels. Previous methods It is difficult to generate continuous, unbounded images of arbitrary sizes.

But the continuous unbounded scene generated by MetaEarth avoids this defect, and you can see that the image moves very smoothly as the "lens" is translated.

Putting the entire earth into a neural network, the Beihang University team launched a global remote sensing image generation model

In addition, MetaEarth has strong generalization performance and can generate multi-resolution images in cascade with unknown scenes as conditional input.

For example, if the "Pandora Planet" generated by GPT4-V is input into the model as the initial condition, MetaEarth is still able to generate images with reasonable distribution of ground objects and realistic details.

Putting the entire earth into a neural network, the Beihang University team launched a global remote sensing image generation model

The verification results on downstream missions show that MetaEarth, as a brand-new data engine, is expected to provide virtual environment and training data support for various downstream missions in the field of earth observation.

During the experiment, the author chose the basic task of remote sensing image classification for verification. The results show that with the assistance of high-quality images generated by MetaEarth, the classification accuracy of downstream tasks has been significantly improved.

Putting the entire earth into a neural network, the Beihang University team launched a global remote sensing image generation model

The author believes that MetaEarth is expected to provide a realistic virtual environment for unmanned aerial system platforms such as satellites, and can be used in urban planning, environmental monitoring, disaster management, agricultural optimization, etc. Widely used in many fields;

In addition to being a data engine, MetaEarth also has great potential in building generative world models, providing new possibilities for future research. .

So, how does MetaEarth realize it?

600 million parameter diffusion model "replicates" the earth

MetaEarth is built based on the probabilistic diffusion model and has a parameter scale of more than 600 million.

To support model training, the team collected a large remote sensing image data set, containing images of multiple spatial resolutions covering most regions around the world and their geographic information (latitude, longitude and resolution) .

In this study, the author proposes a resolution-guided self-cascading generation framework.

Putting the entire earth into a neural network, the Beihang University team launched a global remote sensing image generation model

△The overall framework of MetaEarth

Under this framework, multi-resolution image generation for a given geographical location can be achieved using only a single model, and Create rich and diverse "parallel scenes" at each level of resolution.

Specifically, this is a codec-structured denoising network that combines low-resolution conditional images and spatial resolution encoding with time-step embedding of the denoising process to predict each time step noise to achieve image generation.

In order to generate unbounded images of any size, the author also designed a memory-efficient sliding window generation method and noise sampling strategy.

This strategy divides the generated image into overlapping image blocks as a condition, and uses a specific noise sampling strategy to generate similar content in the shared area of adjacent image blocks, thereby avoiding splicing gaps.

In addition, this noise sampling strategy also enables the model to consume less video memory resources when generating unbounded images of any size.

Team Profile

The author of this study is from the "Learning, Vision and Remote Sensing Laboratory" of Beihang University (LEarning, VIsion and Remote sensing laboratory, LEVIR Lab), the laboratory is led by Professor Shi Zhenwei, a national outstanding student.

Professor Zou Zhengxia, a former doctoral student of Professor Shi Zhenwei, a postdoctoral fellow at the University of Michigan, and a current member of the laboratory, is the corresponding author of this article.

Paper address:https://www.php.cn/link/31bb2feb402ac789507479daf9713b00
Project homepage:https://www.php.cn/link/a0098fd07db7692267fca4f4169c9ba2

The above is the detailed content of Putting the entire earth into a neural network, the Beihang University team launched a global remote sensing image generation model. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Roblox: Grow A Garden - Complete Mutation Guide

3 weeks ago By DDD

Roblox: Bubble Gum Simulator Infinity - How To Get And Use Royal Keys

3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

How to fix KB5055612 fails to install in Windows 10?

3 weeks ago By DDD

Nordhold: Fusion System, Explained

3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Blue Prince: How To Get To The Basement

1 months ago By DDD

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Java Tutorial

1664

CakePHP Tutorial

1423

Laravel Tutorial

1318

PHP Tutorial

1269

C# Tutorial

1248

Related knowledge

The world's most powerful open source MoE model is here, with Chinese capabilities comparable to GPT-4, and the price is only nearly one percent of GPT-4-Turbo May 07, 2024 pm 04:13 PM

Imagine an artificial intelligence model that not only has the ability to surpass traditional computing, but also achieves more efficient performance at a lower cost. This is not science fiction, DeepSeek-V2[1], the world’s most powerful open source MoE model is here. DeepSeek-V2 is a powerful mixture of experts (MoE) language model with the characteristics of economical training and efficient inference. It consists of 236B parameters, 21B of which are used to activate each marker. Compared with DeepSeek67B, DeepSeek-V2 has stronger performance, while saving 42.5% of training costs, reducing KV cache by 93.3%, and increasing the maximum generation throughput to 5.76 times. DeepSeek is a company exploring general artificial intelligence

AI subverts mathematical research! Fields Medal winner and Chinese-American mathematician led 11 top-ranked papers | Liked by Terence Tao Apr 09, 2024 am 11:52 AM

AI is indeed changing mathematics. Recently, Tao Zhexuan, who has been paying close attention to this issue, forwarded the latest issue of "Bulletin of the American Mathematical Society" (Bulletin of the American Mathematical Society). Focusing on the topic "Will machines change mathematics?", many mathematicians expressed their opinions. The whole process was full of sparks, hardcore and exciting. The author has a strong lineup, including Fields Medal winner Akshay Venkatesh, Chinese mathematician Zheng Lejun, NYU computer scientist Ernest Davis and many other well-known scholars in the industry. The world of AI has changed dramatically. You know, many of these articles were submitted a year ago.

Google is ecstatic: JAX performance surpasses Pytorch and TensorFlow! It may become the fastest choice for GPU inference training Apr 01, 2024 pm 07:46 PM

The performance of JAX, promoted by Google, has surpassed that of Pytorch and TensorFlow in recent benchmark tests, ranking first in 7 indicators. And the test was not done on the TPU with the best JAX performance. Although among developers, Pytorch is still more popular than Tensorflow. But in the future, perhaps more large models will be trained and run based on the JAX platform. Models Recently, the Keras team benchmarked three backends (TensorFlow, JAX, PyTorch) with the native PyTorch implementation and Keras2 with TensorFlow. First, they select a set of mainstream

Hello, electric Atlas! Boston Dynamics robot comes back to life, 180-degree weird moves scare Musk Apr 18, 2024 pm 07:58 PM

Boston Dynamics Atlas officially enters the era of electric robots! Yesterday, the hydraulic Atlas just "tearfully" withdrew from the stage of history. Today, Boston Dynamics announced that the electric Atlas is on the job. It seems that in the field of commercial humanoid robots, Boston Dynamics is determined to compete with Tesla. After the new video was released, it had already been viewed by more than one million people in just ten hours. The old people leave and new roles appear. This is a historical necessity. There is no doubt that this year is the explosive year of humanoid robots. Netizens commented: The advancement of robots has made this year's opening ceremony look like a human, and the degree of freedom is far greater than that of humans. But is this really not a horror movie? At the beginning of the video, Atlas is lying calmly on the ground, seemingly on his back. What follows is jaw-dropping

KAN, which replaces MLP, has been extended to convolution by open source projects Jun 01, 2024 pm 10:03 PM

Earlier this month, researchers from MIT and other institutions proposed a very promising alternative to MLP - KAN. KAN outperforms MLP in terms of accuracy and interpretability. And it can outperform MLP running with a larger number of parameters with a very small number of parameters. For example, the authors stated that they used KAN to reproduce DeepMind's results with a smaller network and a higher degree of automation. Specifically, DeepMind's MLP has about 300,000 parameters, while KAN only has about 200 parameters. KAN has a strong mathematical foundation like MLP. MLP is based on the universal approximation theorem, while KAN is based on the Kolmogorov-Arnold representation theorem. As shown in the figure below, KAN has

Tesla robots work in factories, Musk: The degree of freedom of hands will reach 22 this year! May 06, 2024 pm 04:13 PM

The latest video of Tesla's robot Optimus is released, and it can already work in the factory. At normal speed, it sorts batteries (Tesla's 4680 batteries) like this: The official also released what it looks like at 20x speed - on a small "workstation", picking and picking and picking: This time it is released One of the highlights of the video is that Optimus completes this work in the factory, completely autonomously, without human intervention throughout the process. And from the perspective of Optimus, it can also pick up and place the crooked battery, focusing on automatic error correction: Regarding Optimus's hand, NVIDIA scientist Jim Fan gave a high evaluation: Optimus's hand is the world's five-fingered robot. One of the most dexterous. Its hands are not only tactile

DualBEV: significantly surpassing BEVFormer and BEVDet4D, open the book! Mar 21, 2024 pm 05:21 PM

This paper explores the problem of accurately detecting objects from different viewing angles (such as perspective and bird's-eye view) in autonomous driving, especially how to effectively transform features from perspective (PV) to bird's-eye view (BEV) space. Transformation is implemented via the Visual Transformation (VT) module. Existing methods are broadly divided into two strategies: 2D to 3D and 3D to 2D conversion. 2D-to-3D methods improve dense 2D features by predicting depth probabilities, but the inherent uncertainty of depth predictions, especially in distant regions, may introduce inaccuracies. While 3D to 2D methods usually use 3D queries to sample 2D features and learn the attention weights of the correspondence between 3D and 2D features through a Transformer, which increases the computational and deployment time.

$The latest from Oxford University! Mickey: 2D image matching in 3D SOTA! (CVPR\'24)$ The latest from Oxford University! Mickey: 2D image matching in 3D SOTA! (CVPR\'24) Apr 23, 2024 pm 01:20 PM

Project link written in front: https://nianticlabs.github.io/mickey/ Given two pictures, the camera pose between them can be estimated by establishing the correspondence between the pictures. Typically, these correspondences are 2D to 2D, and our estimated poses are scale-indeterminate. Some applications, such as instant augmented reality anytime, anywhere, require pose estimation of scale metrics, so they rely on external depth estimators to recover scale. This paper proposes MicKey, a keypoint matching process capable of predicting metric correspondences in 3D camera space. By learning 3D coordinate matching across images, we are able to infer metric relative

See all articles