


In one sentence, the 3D model can generate a realistic appearance style, down to photo-level details.
Creating 3D content from given input (e.g., from text prompts, images, or 3D shapes) has important applications in the fields of computer vision and graphics. However, this problem is challenging. In reality, it usually requires professional artists (Technical Artists) to spend a lot of time and cost to create 3D content. At the same time, the resources in many online 3D model libraries are usually bare 3D models without any materials. If you want to apply them to the current rendering engine, you need a Technical Artist to create high-quality materials, lights and normal maps for them. . Therefore, it would be promising if there was a way to achieve automated, diverse, and realistic 3D model asset generation.
Therefore, research teams from South China University of Technology, Hong Kong Polytechnic University, Cross-dimensional Intelligence, Pengcheng Laboratory and other institutions have proposed a text-driven three-dimensional Model stylization method - TANGO, this method can automatically generate more realistic SVBRDF materials, normal maps and lights for a given 3D model and text, and has better robustness to low-quality 3D models. This study has been accepted into NeurIPS 2022.
## Project homepage: https://cyw-3d.github.io/tango/
Model EffectFor a given text input and 3D model, TANGO can produce finer, photorealistic details without self-intersection on the surface of the 3D model. question. As shown in Figure 1 below, TANGO not only presents realistic reflection effects on smooth materials (such as gold, silver, etc.), but can also estimate point-by-point normals for uneven materials (such as bricks, etc.) Renders a bumpy effect.
Figure 1. Stylized results of TANGO
TANGO can generate The key to real rendering results is to accurately separate each component (SVBRDF, normal map, light) in the shading model and learn them separately. Finally, these separated components are output through the spherical Gaussian differentiable renderer. , and sent to CLIP and input text to calculate loss. To demonstrate the rationale for decoupling components, the study visualized each component. Figure 2 (a) shows the stylized result of "a pair of shoes made of bricks", (b) shows the original normal direction of the 3D model, (c) is the normal direction predicted by TANGO for each point on the 3D model, (d) (e) (f) represent the diffuse reflection, roughness and specular reflection parameters in SVBRDF respectively, (g) is the ambient light expressed by the spherical Gaussian function predicted by TANGO.
Figure 2 Visualization of decoupled rendering components
At the same time, the Research can also edit the results output by TANGO. For example, in Figure 3, the research can use other light maps to re-light the TANGO results; in Figure 4, the roughness and specular reflectivity parameters can be edited to change the degree of reflection on the object surface.
Figure 3 Re-lighting the TANGO stylized result
Figure 4 Editing the material of the object
In addition, because TANGO uses predicted normal maps to add object surface details, it is also very robust to three-dimensional models with a small number of vertices. As shown in Figure 5, the original lamp and alien models had 41160 and 68430 faces respectively. The researchers downsampled the original models and obtained a model with only 5000 faces. It can be seen that the performance of TANGO on the original model and the downsampled model is basically similar, while Text2Mesh has a serious self-intersection phenomenon on the low-quality model.
Figure 5 Robustness Test
Principle and Method
TANGO mainly focuses on methods for text-guided stylization of three-dimensional objects. The most relevant current work in this area is Text2Mesh, which uses the pre-trained model CLIP as a guide to predict the color and position offset of surface vertices of a 3D model to achieve stylization. However, simply predicting surface vertex colors often produces unrealistic rendering effects, and irregular vertex offsets can cause severe self-intersections. Therefore, this research draws on the traditional physically based rendering pipeline to decouple the entire rendering process into the prediction process of SVBRDF materials, normal maps and lights, and express the decoupled elements with spherical Gaussian functions respectively. This physics-based decoupling method allows TANGO to correctly produce realistic rendering effects and has good robustness.
Figure 6 TANGO flow chart
Figure 6 shows the flow chart of TANGO work process. Given a 3D model and text (such as "a shoe made of gold" in the picture), the study first scales the 3D model to a unit sphere, and then samples the camera position near the 3D model. At this camera position Emit rays to find the intersection point with the three-dimensional model xp and the normal direction of the intersection point np. Next, xp and np will be sent to the SVBRDF network and Normal network to predict the material parameters and methods of the point. Line direction, and at the same time, multiple spherical Gaussian functions are used to express the lighting in the scene. For each training iteration, the study renders the image using a differentiable spherical Gaussian renderer, then encodes the augmented image using the CLIP model's image encoder, and finally the CLIP model backpropagates gradients to update all learnable parameters.
Summary
This paper proposes TANGO, a new method that generates realistic appearance styles for 3D models based on input text and is robust to low-quality models. By decoupling appearance style from SVBRDF, local geometric changes (pointwise normals) and lighting conditions, and representing and rendering these as spherical Gaussian functions, we can use CLIP as loss supervision and learn.
Compared with existing methods, TANGO can be very robust even for low-quality 3D models. However, the method of providing geometric details point-by-point normal while avoiding self-intersection will also slightly reduce the degree of concavity and convexity of the material surface that can be expressed. This study believes that TANGO and Text2Mesh based on vertex offset are performed in their respective directions. It is a good preliminary attempt and will inspire more follow-up research.
The above is the detailed content of In one sentence, the 3D model can generate a realistic appearance style, down to photo-level details.. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics



Imagine an artificial intelligence model that not only has the ability to surpass traditional computing, but also achieves more efficient performance at a lower cost. This is not science fiction, DeepSeek-V2[1], the world’s most powerful open source MoE model is here. DeepSeek-V2 is a powerful mixture of experts (MoE) language model with the characteristics of economical training and efficient inference. It consists of 236B parameters, 21B of which are used to activate each marker. Compared with DeepSeek67B, DeepSeek-V2 has stronger performance, while saving 42.5% of training costs, reducing KV cache by 93.3%, and increasing the maximum generation throughput to 5.76 times. DeepSeek is a company exploring general artificial intelligence

AI is indeed changing mathematics. Recently, Tao Zhexuan, who has been paying close attention to this issue, forwarded the latest issue of "Bulletin of the American Mathematical Society" (Bulletin of the American Mathematical Society). Focusing on the topic "Will machines change mathematics?", many mathematicians expressed their opinions. The whole process was full of sparks, hardcore and exciting. The author has a strong lineup, including Fields Medal winner Akshay Venkatesh, Chinese mathematician Zheng Lejun, NYU computer scientist Ernest Davis and many other well-known scholars in the industry. The world of AI has changed dramatically. You know, many of these articles were submitted a year ago.

The performance of JAX, promoted by Google, has surpassed that of Pytorch and TensorFlow in recent benchmark tests, ranking first in 7 indicators. And the test was not done on the TPU with the best JAX performance. Although among developers, Pytorch is still more popular than Tensorflow. But in the future, perhaps more large models will be trained and run based on the JAX platform. Models Recently, the Keras team benchmarked three backends (TensorFlow, JAX, PyTorch) with the native PyTorch implementation and Keras2 with TensorFlow. First, they select a set of mainstream

Boston Dynamics Atlas officially enters the era of electric robots! Yesterday, the hydraulic Atlas just "tearfully" withdrew from the stage of history. Today, Boston Dynamics announced that the electric Atlas is on the job. It seems that in the field of commercial humanoid robots, Boston Dynamics is determined to compete with Tesla. After the new video was released, it had already been viewed by more than one million people in just ten hours. The old people leave and new roles appear. This is a historical necessity. There is no doubt that this year is the explosive year of humanoid robots. Netizens commented: The advancement of robots has made this year's opening ceremony look like a human, and the degree of freedom is far greater than that of humans. But is this really not a horror movie? At the beginning of the video, Atlas is lying calmly on the ground, seemingly on his back. What follows is jaw-dropping

Earlier this month, researchers from MIT and other institutions proposed a very promising alternative to MLP - KAN. KAN outperforms MLP in terms of accuracy and interpretability. And it can outperform MLP running with a larger number of parameters with a very small number of parameters. For example, the authors stated that they used KAN to reproduce DeepMind's results with a smaller network and a higher degree of automation. Specifically, DeepMind's MLP has about 300,000 parameters, while KAN only has about 200 parameters. KAN has a strong mathematical foundation like MLP. MLP is based on the universal approximation theorem, while KAN is based on the Kolmogorov-Arnold representation theorem. As shown in the figure below, KAN has

Today I would like to share a recent research work from the University of Connecticut that proposes a method to align time series data with large natural language processing (NLP) models on the latent space to improve the performance of time series forecasting. The key to this method is to use latent spatial hints (prompts) to enhance the accuracy of time series predictions. Paper title: S2IP-LLM: SemanticSpaceInformedPromptLearningwithLLMforTimeSeriesForecasting Download address: https://arxiv.org/pdf/2403.05798v1.pdf 1. Large problem background model

Target detection is a relatively mature problem in autonomous driving systems, among which pedestrian detection is one of the earliest algorithms to be deployed. Very comprehensive research has been carried out in most papers. However, distance perception using fisheye cameras for surround view is relatively less studied. Due to large radial distortion, standard bounding box representation is difficult to implement in fisheye cameras. To alleviate the above description, we explore extended bounding box, ellipse, and general polygon designs into polar/angular representations and define an instance segmentation mIOU metric to analyze these representations. The proposed model fisheyeDetNet with polygonal shape outperforms other models and simultaneously achieves 49.5% mAP on the Valeo fisheye camera dataset for autonomous driving

The latest video of Tesla's robot Optimus is released, and it can already work in the factory. At normal speed, it sorts batteries (Tesla's 4680 batteries) like this: The official also released what it looks like at 20x speed - on a small "workstation", picking and picking and picking: This time it is released One of the highlights of the video is that Optimus completes this work in the factory, completely autonomously, without human intervention throughout the process. And from the perspective of Optimus, it can also pick up and place the crooked battery, focusing on automatic error correction: Regarding Optimus's hand, NVIDIA scientist Jim Fan gave a high evaluation: Optimus's hand is the world's five-fingered robot. One of the most dexterous. Its hands are not only tactile
