Home Technology peripherals AI ICML 2024|Complex combination 3D scene generation, LLMs conversational 3D controllable generation and editing framework is here

ICML 2024|Complex combination 3D scene generation, LLMs conversational 3D controllable generation and editing framework is here

Jul 31, 2024 pm 08:12 PM
project GALA3D

ICML 2024|复杂组合3D场景生成,LLMs对话式3D可控生成编辑框架来了
The AIxiv column is a column where this site publishes academic and technical content. In the past few years, the AIxiv column of this site has received more than 2,000 reports, covering top laboratories from major universities and companies around the world, effectively promoting academic exchanges and dissemination. If you have excellent work that you want to share, please feel free to contribute or contact us for reporting. Submission email: liyazhou@jiqizhixin.com; zhaoyunfeng@jiqizhixin.com

The first author and corresponding author of this paper are both from the VDIG (Visual Data Interpreting and Generation) Laboratory of Wangxuan Computer Institute of Peking University, the first The author is doctoral student Zhou Xiaoyu, and the corresponding author is doctoral supervisor Wang Yongtao. In recent years, the VDIG laboratory has published a number of representative results at top conferences such as IJCV, CVPR, AAAI, ICCV, ICML, and ECCV. It has won the championship and runner-up awards in heavyweight competitions in the CV field at home and abroad for many times, and has won awards from well-known universities at home and abroad, Scientific research institutions cooperate extensively.

In recent years, Text-to-3D methods for single objects have made a series of breakthroughs, but generating controllable, high-quality complex multi-object 3D scenes from text still faces huge challenges. Previous methods have major flaws in the complexity, geometric quality, texture consistency, multi-object interaction, controllability and editability of the generated scene.

Recently, the VDIG research team from the Wangxuan Institute of Computer Science at Peking University and its collaborators announced the latest research results GALA3D. For the generation of multi-object complex 3D scenes, this work proposes an LLM-guided controllable generation framework for complex 3D scenes, GALA3D, which can generate high-quality, high-consistency 3D scenes with multiple objects and complex interactive relationships, and supports conversational interaction. Controlling editor, the paper has been accepted by ICML 2024.

ICML 2024|复杂组合3D场景生成,LLMs对话式3D可控生成编辑框架来了

  • Paper title: GALA3D: Towards Text-to-3D Complex Scene Generation via Layout-guided Generative Gaussian Splatting

  • Paper link: https://arxiv.org/pdf/2402.07207

  • Paper code: https://github.com/VDIGPKU/GALA3D

  • Project website: https://gala3d.github.io/

ICML 2024|复杂组合3D场景生成,LLMs对话式3D可控生成编辑框架来了

GALA3D is a high-quality Text-to-3D complex Combined scene generation and controllable editing framework. The user inputs a description text, and GALA3D can zero-shot generate the corresponding three-dimensional scene with multiple objects and complex interactive relationships. GALA3D demonstrates its excellent performance in generating scene quality, complex interaction of multiple objects, and scene geometric consistency while ensuring that the generated 3D scene is highly aligned with the text. In addition, GALA3D supports user-friendly end-to-end generation and controlled editing, allowing ordinary users to easily customize and edit 3D scenes in conversational conversations. In communicating with users, GALA3D can accurately realize conversational and controllable editing of complex 3D scenes, and realize diversified controllable editing needs such as layout transformation of complex 3D scenes, embedding of digital assets, and changes in decoration style based on user dialogue. .

Method introduction

The overall architecture of GALA3D is shown in the figure below:

ICML 2024|复杂组合3D场景生成,LLMs对话式3D可控生成编辑框架来了

GALA3D uses large language models (LLMs) to generate initial layouts, and proposes a layout-guided generative 3D Gaussian representation to construct complex 3D scenes. GALA3D Design optimizes the shape and distribution of 3D Gaussians through adaptive geometry control to generate 3D scenes with consistent geometry, texture, scale and precise interactions. In addition, GALA3D also proposes a combined optimization mechanism that combines conditional diffusion priors and Vincentian graph models to collaboratively generate 3D multi-object scenes with consistent styles, while iteratively optimizing the initial layout priors extracted from LLMs to obtain more realistic and accurate The real scene space layout. Extensive quantitative experiments and qualitative studies demonstrate that GALA3D achieves significant results in text-to-complex 3D scene generation, surpassing existing Vincent 3D scene methods.

a, scene layout prior based on LLMs

大規模言語モデルは、優れた自然言語理解および推論機能を実証します。この記事では、3D 複雑なシーンにおける LLM の大規模言語モデルの推論およびレイアウト生成機能についてさらに詳しく説明します。手動設計を行わずに比較的合理的なレイアウトを事前に取得する方法は、シーンのモデリングと生成のコストを削減するのに役立ちます。このため、LLM (GPT-3.5 など) を使用してテキスト入力のインスタンスとその空間関係を抽出し、対応するレイアウト事前分布を生成します。ただし、3D 空間レイアウトと、LLM によって解釈されるシーン以前のレイアウトと実際のシーンの間には一定のギャップがあり、その結果、通常、浮遊オブジェクトや通過オブジェクト、プロポーションが過度に異なるオブジェクトの組み合わせなどが生成されます。さらに、ビジョンベースの事前拡散とレイアウトガイドによる生成 3D ガウスを通じて、上記で生成された大まかなレイアウトを調整および最適化するレイアウト調整モジュールを提案します。

b、レイアウトの改良

GALA3D は、上記の LLM によって事前に生成されたレイアウトを最適化する前に、拡散に基づくレイアウト レイアウト最適化モジュールを使用します。具体的には、レイアウトガイド付き 3D ガウス空間レイアウトの勾配最適化を 3D 生成プロセスに追加し、ControlNet を通じて LLM で生成されたレイアウトの空間位置、回転角度、サイズ比を調整しました。図は、その前と前の 3D シーンとレイアウトを示しています。最適化後。最適化されたレイアウトは、より正確な空間位置とスケールを持ち、3D シーン内の複数のオブジェクト間の相互作用がより合理的になります。

ICML 2024|复杂组合3D场景生成,LLMs对话式3D可控生成编辑框架来了

c、レイアウトガイド付き生成 3D ガウス表現

3D レイアウト制約を 3D ガウス表現に初めて導入し、複雑な Vincent 3D シーン用のレイアウトガイド付き生成 3D ガウスを提案します。レイアウト ガイド付き 3D ガウス表現には、意味論的に抽出された複数のインスタンス オブジェクトが含まれており、各インスタンス オブジェクトの事前レイアウトは次のようにパラメーター化できます。

ICML 2024|复杂组合3D场景生成,LLMs对话式3D可控生成编辑框架来了

ここで、N はシーン内のインスタンス オブジェクトの総数を表します。具体的には、各インスタンスの 3D ガウスは、適応ジオメトリ制御を通じて最適化され、インスタンス レベルのオブジェクトの 3D ガウス表現が取得されます。さらに、相対的な位置関係に従って複数のオブジェクト ガウスをシーン全体に結合し、レイアウトに基づいてグローバル 3D ガウスを生成し、グローバル ガウス スプラッティングを通じてシーン全体をレンダリングします。

d、適応幾何制御

生成プロセス中に 3D ガウスの空間分布と幾何学的形状をより適切に制御するために、生成 3D ガウスの適応幾何制御方法を提案します。まず、初期ガウスのセットが与えられると、3D ガウスをレイアウト範囲内に制限するために、GALA3D は一連の密度分布関数を使用してガウス楕円体の空間位置を制限します。次に、レイアウト サーフェス付近のガウスをサンプリングして、分布関数に適合させます。その後、形状正則化を使用して 3D ガウスの幾何学形状を制御することを提案します。 3D 生成プロセス中、適応ジオメトリ制御はガウス分布とジオメトリを継続的に最適化し、より詳細なテクスチャと規則的なジオメトリを備えた 3D マルチオブジェクトとシーンを生成します。また、適応型ジオメトリ制御により、レイアウトに基づいて生成される 3D ガウスの制御性と一貫性が向上します。

実験結果

既存の Text-to-3D 生成方法と比較して、GALA3D は 3D シーン生成の品質と一貫性が優れていることが次の表に示されています。有効ユーザー調査を実施し、125 名の参加者 (うち 39.2% は関連分野の専門家および実践者) を対象に、この記事の手法と既存の手法の生成シナリオを多角的に評価しました。その結果を以下に示します。表:

ICML 2024|复杂组合3D场景生成,LLMs对话式3D可控生成编辑框架来了

実験結果は、GALA3D がシーンの品質、幾何学的忠実度、テキストの一貫性、シーンの一貫性などの多次元評価指標において既存の手法を上回り、最適な生成品質を達成することを示しています。

下の図の定性的な実験結果に示されているように、GALA3D は複雑なマルチオブジェクトの組み合わせの 3D シーンをゼロショットで一貫性よく生成できます。 ICML 2024|复杂组合3D场景生成,LLMs对话式3D可控生成编辑框架来了

下の図は、GALA3D がユーザーフレンドリーで会話型をサポートできることを示しています制御可能な生成と編集:

ICML 2024|复杂组合3D场景生成,LLMs对话式3D可控生成编辑框架来了

研究の詳細については、元の論文を参照してください。

The above is the detailed content of ICML 2024|Complex combination 3D scene generation, LLMs conversational 3D controllable generation and editing framework is here. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Java Tutorial
1664
14
PHP Tutorial
1267
29
C# Tutorial
1239
24
The author of ControlNet has another hit! The whole process of generating a painting from a picture, earning 1.4k stars in two days The author of ControlNet has another hit! The whole process of generating a painting from a picture, earning 1.4k stars in two days Jul 17, 2024 am 01:56 AM

It is also a Tusheng video, but PaintsUndo has taken a different route. ControlNet author LvminZhang started to live again! This time I aim at the field of painting. The new project PaintsUndo has received 1.4kstar (still rising crazily) not long after it was launched. Project address: https://github.com/lllyasviel/Paints-UNDO Through this project, the user inputs a static image, and PaintsUndo can automatically help you generate a video of the entire painting process, from line draft to finished product. follow. During the drawing process, the line changes are amazing. The final video result is very similar to the original image: Let’s take a look at a complete drawing.

Topping the list of open source AI software engineers, UIUC's agent-less solution easily solves SWE-bench real programming problems Topping the list of open source AI software engineers, UIUC's agent-less solution easily solves SWE-bench real programming problems Jul 17, 2024 pm 10:02 PM

The AIxiv column is a column where this site publishes academic and technical content. In the past few years, the AIxiv column of this site has received more than 2,000 reports, covering top laboratories from major universities and companies around the world, effectively promoting academic exchanges and dissemination. If you have excellent work that you want to share, please feel free to contribute or contact us for reporting. Submission email: liyazhou@jiqizhixin.com; zhaoyunfeng@jiqizhixin.com The authors of this paper are all from the team of teacher Zhang Lingming at the University of Illinois at Urbana-Champaign (UIUC), including: Steven Code repair; Deng Yinlin, fourth-year doctoral student, researcher

From RLHF to DPO to TDPO, large model alignment algorithms are already 'token-level' From RLHF to DPO to TDPO, large model alignment algorithms are already 'token-level' Jun 24, 2024 pm 03:04 PM

The AIxiv column is a column where this site publishes academic and technical content. In the past few years, the AIxiv column of this site has received more than 2,000 reports, covering top laboratories from major universities and companies around the world, effectively promoting academic exchanges and dissemination. If you have excellent work that you want to share, please feel free to contribute or contact us for reporting. Submission email: liyazhou@jiqizhixin.com; zhaoyunfeng@jiqizhixin.com In the development process of artificial intelligence, the control and guidance of large language models (LLM) has always been one of the core challenges, aiming to ensure that these models are both powerful and safe serve human society. Early efforts focused on reinforcement learning methods through human feedback (RL

arXiv papers can be posted as 'barrage', Stanford alphaXiv discussion platform is online, LeCun likes it arXiv papers can be posted as 'barrage', Stanford alphaXiv discussion platform is online, LeCun likes it Aug 01, 2024 pm 05:18 PM

cheers! What is it like when a paper discussion is down to words? Recently, students at Stanford University created alphaXiv, an open discussion forum for arXiv papers that allows questions and comments to be posted directly on any arXiv paper. Website link: https://alphaxiv.org/ In fact, there is no need to visit this website specifically. Just change arXiv in any URL to alphaXiv to directly open the corresponding paper on the alphaXiv forum: you can accurately locate the paragraphs in the paper, Sentence: In the discussion area on the right, users can post questions to ask the author about the ideas and details of the paper. For example, they can also comment on the content of the paper, such as: "Given to

Posthumous work of the OpenAI Super Alignment Team: Two large models play a game, and the output becomes more understandable Posthumous work of the OpenAI Super Alignment Team: Two large models play a game, and the output becomes more understandable Jul 19, 2024 am 01:29 AM

If the answer given by the AI ​​model is incomprehensible at all, would you dare to use it? As machine learning systems are used in more important areas, it becomes increasingly important to demonstrate why we can trust their output, and when not to trust them. One possible way to gain trust in the output of a complex system is to require the system to produce an interpretation of its output that is readable to a human or another trusted system, that is, fully understandable to the point that any possible errors can be found. For example, to build trust in the judicial system, we require courts to provide clear and readable written opinions that explain and support their decisions. For large language models, we can also adopt a similar approach. However, when taking this approach, ensure that the language model generates

A significant breakthrough in the Riemann Hypothesis! Tao Zhexuan strongly recommends new papers from MIT and Oxford, and the 37-year-old Fields Medal winner participated A significant breakthrough in the Riemann Hypothesis! Tao Zhexuan strongly recommends new papers from MIT and Oxford, and the 37-year-old Fields Medal winner participated Aug 05, 2024 pm 03:32 PM

Recently, the Riemann Hypothesis, known as one of the seven major problems of the millennium, has achieved a new breakthrough. The Riemann Hypothesis is a very important unsolved problem in mathematics, related to the precise properties of the distribution of prime numbers (primes are those numbers that are only divisible by 1 and themselves, and they play a fundamental role in number theory). In today's mathematical literature, there are more than a thousand mathematical propositions based on the establishment of the Riemann Hypothesis (or its generalized form). In other words, once the Riemann Hypothesis and its generalized form are proven, these more than a thousand propositions will be established as theorems, which will have a profound impact on the field of mathematics; and if the Riemann Hypothesis is proven wrong, then among these propositions part of it will also lose its effectiveness. New breakthrough comes from MIT mathematics professor Larry Guth and Oxford University

The first Mamba-based MLLM is here! Model weights, training code, etc. have all been open source The first Mamba-based MLLM is here! Model weights, training code, etc. have all been open source Jul 17, 2024 am 02:46 AM

The AIxiv column is a column where this site publishes academic and technical content. In the past few years, the AIxiv column of this site has received more than 2,000 reports, covering top laboratories from major universities and companies around the world, effectively promoting academic exchanges and dissemination. If you have excellent work that you want to share, please feel free to contribute or contact us for reporting. Submission email: liyazhou@jiqizhixin.com; zhaoyunfeng@jiqizhixin.com. Introduction In recent years, the application of multimodal large language models (MLLM) in various fields has achieved remarkable success. However, as the basic model for many downstream tasks, current MLLM consists of the well-known Transformer network, which

LLM is really not good for time series prediction. It doesn't even use its reasoning ability. LLM is really not good for time series prediction. It doesn't even use its reasoning ability. Jul 15, 2024 pm 03:59 PM

Can language models really be used for time series prediction? According to Betteridge's Law of Headlines (any news headline ending with a question mark can be answered with "no"), the answer should be no. The fact seems to be true: such a powerful LLM cannot handle time series data well. Time series, that is, time series, as the name suggests, refers to a set of data point sequences arranged in the order of time. Time series analysis is critical in many areas, including disease spread prediction, retail analytics, healthcare, and finance. In the field of time series analysis, many researchers have recently been studying how to use large language models (LLM) to classify, predict, and detect anomalies in time series. These papers assume that language models that are good at handling sequential dependencies in text can also generalize to time series.

See all articles