Table of Contents
Experiment
Home Technology peripherals AI Natural language is integrated into NeRF, and LERF, which generates 3D images with just a few words, is here.

Natural language is integrated into NeRF, and LERF, which generates 3D images with just a few words, is here.

Apr 13, 2023 pm 07:31 PM
natural language

NeRF (Neural Radiance Fields), also known as neural radiation fields, has quickly become one of the most popular research fields since it was proposed, and the results are amazing. However, the direct output of NeRF is only a colored density field, which provides little information to researchers. The lack of context is one of the problems that need to be faced. The effect is that it directly affects the construction of interactive interfaces with 3D scenes.

But natural language is different. Natural language interacts with 3D scenes very intuitively. We can use the kitchen scene in Figure 1 to explain that objects can be found in the kitchen by asking where the cutlery is, or asking where the tools used to stir are. However, completing this task requires not only the query capabilities of the model, but also the ability to incorporate semantics at multiple scales.

In this article, researchers from UC Berkeley proposed a novel method and named it LERF (Language Embedded Radiance Fields), which combines CLIP (Contrastive Language-Image Pre -training) are embedded into NeRF, making these types of 3D open language queries possible. LERF uses CLIP directly, without the need for fine-tuning through datasets such as COCO, or relying on masked region suggestions. LERF preserves the integrity of CLIP embeddings at multiple scales and is also able to handle a variety of linguistic queries, including visual attributes (e.g., yellow), abstract concepts (e.g., electric current), text, etc., as shown in Figure 1.

Natural language is integrated into NeRF, and LERF, which generates 3D images with just a few words, is here.

##Paper address: https://arxiv.org/pdf/2303.09553v1.pdf

Project homepage: https://www.lerf.io/

LERF can interactively provide languages ​​for real-time Prompt to extract 3D related diagrams. For example, on a table with a lamb and a water cup, enter the prompt lamb or water cup, and LERF can give the relevant 3D picture:

Natural language is integrated into NeRF, and LERF, which generates 3D images with just a few words, is here.

# #For complex bouquets, LERF can also pinpoint:

Natural language is integrated into NeRF, and LERF, which generates 3D images with just a few words, is here.

Different objects in the kitchen:

Natural language is integrated into NeRF, and LERF, which generates 3D images with just a few words, is here.Method

This study constructed a new method LERF by jointly optimizing the language field with NeRF. LERF takes position and physical scale as input and outputs a single CLIP vector. During training, the fields are supervised using a multi-scale feature pyramid containing CLIP embeddings generated from image crops of the training views. This allows the CLIP encoder to capture image context at different scales, thereby associating the same 3D location with language embeddings at different scales. LERF can query the language field at any scale during testing to obtain a 3D correlation map.

Natural language is integrated into NeRF, and LERF, which generates 3D images with just a few words, is here.

#Since CLIP embeddings are extracted from multiple views at multiple scales, the correlation mapping of text queries obtained by LERF’s 3D CLIP embeddings is consistent with The one obtained through 2D CLIP embedding is more localized and 3D consistent, and can be queried directly in the 3D field without rendering multiple views.

Natural language is integrated into NeRF, and LERF, which generates 3D images with just a few words, is here.

LERF requires learning a language embedding field on a volume centered on a sample point. Specifically, the output of this field is the average CLIP embedding of all training views containing image crops of the specified volume. By reconstructing queries from points to volumes, LERF can effectively supervise dense fields from coarse crops of input images, which can be rendered in a pixel-aligned manner by conditioning on a given volumetric scale.

Natural language is integrated into NeRF, and LERF, which generates 3D images with just a few words, is here.

#LERF itself produces coherent results, but the resulting correlation map can sometimes be incomplete and contain some outliers, as shown in Figure 5 below.

Natural language is integrated into NeRF, and LERF, which generates 3D images with just a few words, is here.

To standardize the optimized language field, this study introduces self-supervised DINO through shared bottlenecks.

In terms of architecture, optimizing language embedding in 3D should not affect the density distribution in the underlying scene representation, so this study captures the inductive bias in LERF by training two independent networks. Settings (inductive bias): one for feature vectors (DINO, CLIP) and one for standard NeRF output (color, density).

Experiment

To demonstrate LERF’s ability to process real-world data, the study collected 13 scenes, including grocery stores, kitchens, bookstores, figurines, etc. . Figure 3 selects 5 representative scenarios to demonstrate LERF’s ability to process natural language.

Natural language is integrated into NeRF, and LERF, which generates 3D images with just a few words, is here.

##Figure 3

Figure 7 is 3D visual comparison of LERF and LSeg. In the eggs in the calibration bowl, LSeg is inferior to LERF:

Natural language is integrated into NeRF, and LERF, which generates 3D images with just a few words, is here.

Figure 8 shows that under limited segmentation data LSeg trained on the set lacks the ability to effectively represent natural language. Instead, it only performs well on common objects within the training set distribution, as shown in Figure 7.

Natural language is integrated into NeRF, and LERF, which generates 3D images with just a few words, is here.

However, the LERF method is not perfect yet. The following are failure cases. For example, when calibrating zucchini vegetables, other vegetables will appear:

Natural language is integrated into NeRF, and LERF, which generates 3D images with just a few words, is here.

The above is the detailed content of Natural language is integrated into NeRF, and LERF, which generates 3D images with just a few words, is here.. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
2 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
Repo: How To Revive Teammates
4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
Hello Kitty Island Adventure: How To Get Giant Seeds
4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Introduction to five sampling methods in natural language generation tasks and Pytorch code implementation Introduction to five sampling methods in natural language generation tasks and Pytorch code implementation Feb 20, 2024 am 08:50 AM

In natural language generation tasks, sampling method is a technique to obtain text output from a generative model. This article will discuss 5 common methods and implement them using PyTorch. 1. GreedyDecoding In greedy decoding, the generative model predicts the words of the output sequence based on the input sequence time step by time. At each time step, the model calculates the conditional probability distribution of each word, and then selects the word with the highest conditional probability as the output of the current time step. This word becomes the input to the next time step, and the generation process continues until some termination condition is met, such as a sequence of a specified length or a special end marker. The characteristic of GreedyDecoding is that each time the current conditional probability is the best

How to do basic natural language generation using PHP How to do basic natural language generation using PHP Jun 22, 2023 am 11:05 AM

Natural language generation is an artificial intelligence technology that converts data into natural language text. In today's big data era, more and more businesses need to visualize or present data to users, and natural language generation is a very effective method. PHP is a very popular server-side scripting language that can be used to develop web applications. This article will briefly introduce how to use PHP for basic natural language generation. Introducing the natural language generation library The function library that comes with PHP does not include the functions required for natural language generation, so

Traffic Engineering doubles code generation accuracy: from 19% to 44% Traffic Engineering doubles code generation accuracy: from 19% to 44% Feb 05, 2024 am 09:15 AM

The authors of a new paper propose a way to "enhance" code generation. Code generation is an increasingly important capability in artificial intelligence. It automatically generates computer code based on natural language descriptions by training machine learning models. This technology has broad application prospects and can transform software specifications into usable code, automate back-end development, and assist human programmers to improve work efficiency. However, generating high-quality code remains challenging for AI systems, compared with language tasks such as translation or summarization. The code must accurately conform to the syntax of the target programming language, handle edge cases and unexpected inputs gracefully, and handle the many small details of the problem description accurately. Even small bugs that may seem innocuous in other areas can completely disrupt the functionality of a program, causing

Building text generators using Markov chains Building text generators using Markov chains Apr 09, 2023 pm 10:11 PM

In this article, we will introduce a popular machine learning project - text generator. You will learn how to build a text generator and learn how to implement a Markov chain to achieve a faster predictive model. Introduction to Text Generators Text generation is popular across industries, especially in mobile, apps, and data science. Even the press uses text generation to aid the writing process. In daily life, we will come into contact with some text generation technologies. Text completion, search suggestions, Smart Compose, and chat robots are all examples of applications. This article will use Markov chains to build a text generator. This would be a character-based model that takes the previous character of the chain and generates the next letter in the sequence. By training our program on sample words,

Cursor integrated with GPT-4 makes writing code as easy as chatting. A new era of coding in natural language has arrived. Cursor integrated with GPT-4 makes writing code as easy as chatting. A new era of coding in natural language has arrived. Apr 04, 2023 pm 12:15 PM

Github Copilot X, which integrates GPT-4, is still in small-scale internal testing, while Cursor, which integrates GPT-4, has been publicly released. Cursor is an IDE that integrates GPT-4 and can write code in natural language, making writing code as easy as chatting. There is still a big difference between GPT-4 and GPT-3.5 in their ability to process and write code. A test report from the official website. The first two are GPT-4, one uses text input and the other uses image input; the third is GPT3.5. It can be seen that the coding capabilities of GPT-4 have been greatly improved compared to GPT-3.5. Github Copilot X integrating GPT-4 is still in small-scale testing, and

Is it necessary to 'participle'? Andrej Karpathy: It's time to throw away this historical baggage Is it necessary to 'participle'? Andrej Karpathy: It's time to throw away this historical baggage May 20, 2023 pm 12:52 PM

The emergence of conversational AI such as ChatGPT has made people accustomed to this kind of thing: input a piece of text, code or a picture, and the conversational robot will give you the answer you want. But behind this simple interaction method, the AI ​​model needs to perform very complex data processing and calculations, and tokenization is a common one. In the field of natural language processing, tokenization refers to dividing text input into smaller units, called "tokens". These tokens can be words, subwords or characters, depending on the specific word segmentation strategy and task requirements. For example, if we perform tokenization on the sentence "I like eating apples", we will get a sequence of tokens: [&qu

With full coverage of values ​​and privacy protection, the Cyberspace Administration of China plans to 'establish rules” for generative AI With full coverage of values ​​and privacy protection, the Cyberspace Administration of China plans to 'establish rules” for generative AI Apr 13, 2023 pm 03:34 PM

On April 11, the Cyberspace Administration of China (hereinafter referred to as the Cyberspace Administration of China) drafted and released the "Measures for the Management of Generative Artificial Intelligence Services (Draft for Comments)" and launched a month-long solicitation of opinions from the public. This management measure (draft for comments) has a total of 21 articles. In terms of scope of application, it includes both entities that provide generative artificial intelligence services, as well as organizations and individuals who use these services; the management measures cover the output content of generative artificial intelligence. value orientation, training principles for service providers, protection of privacy/intellectual property rights and other rights, etc. The emergence of large-scale generative natural language models and products such as GPT not only allowed the public to experience the rapid progress of artificial intelligence, but also exposed security risks, including the generation of biased and discriminatory information.

Five success stories explore the business value of natural language processing Five success stories explore the business value of natural language processing Apr 13, 2023 am 11:34 AM

Data is now one of the most valuable enterprise commodities. According to CIO.com's "State of the CIO 2022" report, 35% of IT leaders said that data and business analytics will account for the largest share of their organization's IT investments this year, and 58% of respondents said that in the next year They will increase their investment in data analysis. While data comes in many forms, perhaps the largest, untapped data pool is text. Whether it’s patents, product specifications, academic publications, market research, news, or social feeds, it’s all text-based, and The amount of text is also growing. According to Foundry’s 2022 Data and Analytics Study, 36% of IT leaders believe that managing this unstructured data is the biggest challenge they face

See all articles