Home Technology peripherals AI Digital people light the main torch of the Asian Games, and this ICCV paper reveals Ant's generative AI black technology

Digital people light the main torch of the Asian Games, and this ICCV paper reveals Ant's generative AI black technology

Sep 29, 2023 pm 11:57 PM
digital man industry generative ai iccv

Open a digital human, which is full of generative AI.

On the evening of September 23, at the opening ceremony of the Hangzhou Asian Games, the lighting of the main torch showed the "little flames" of hundreds of millions of online digital torchbearers gathering on the Qiantang River. A digital human image is formed. Then, the digital human torchbearer and the sixth torchbearer on site walked to the torch stage together and lit the main torch together

Digital people light the main torch of the Asian Games, and this ICCV paper reveals Ants generative AI black technology

As the core idea of ​​the opening ceremony, the digital torch bearer The Internet's torch-lighting form has become a hot search topic and attracted people's attention. Rewritten content: As the core idea of ​​the opening ceremony, the torch lighting method of Digital Reality Internet has aroused heated discussions and attracted people's attention.

Digital People Ignition is an unprecedented initiative, with hundreds of millions of people participating. , involving a large number of advanced and complex technologies. One of the most important issues is how to make digital people "move". It can be clearly seen that with the rapid development of generative artificial intelligence and large-scale models, more new changes have appeared in digital human research

At the upcoming global computer vision conference ICCV 2023 in early October, We noticed that a study on generating 3D digital human motion was included in the conference. The related paper is titled "Hierarchical Generation of Human-Object Interactions with Diffusion Probabilistic Models" and was jointly published by Zhejiang University and Ant Group.

Digital people light the main torch of the Asian Games, and this ICCV paper reveals Ants generative AI black technology

According to the introduction, this research solves to a certain extent the problem of digital humans synthesizing complex movements over long distances, and can achieve effects that cannot be achieved with original models or path planning. Technology related to digital human driving has also been used in the online delivery of 100 million digital human beings in the Asian Games

Generative AI driver to make digital humans move

Many times , we need to synthesize 3D human motion in a given 3D scene so that virtual humans can naturally walk around the scene and interact with objects. This effect has many applications in AR/VR, film production, and video games.

Here, traditional character control motion generation methods aim to generate short-term or repetitive motions guided by the user's control signals, while new research focuses on generating a given starting position and target object model. Longer human-computer interaction content.

Although this idea is more effective, it is obviously more challenging. First, human-object interactions should be coherent, which requires the ability to model long-range interactions between humans and objects. Second, in the context of content generation, generative models should be able to synthesize motions of different sizes, since there are multiple ways for real people to approach and interact with target objects.

Digital people light the main torch of the Asian Games, and this ICCV paper reveals Ants generative AI black technology
Figure 1. Generation of interactive images between people and objects. Given an object, the new method first predicts a set of milestone events, where the ring represents the position and the person in pink represents the original pose. The algorithm fills in actions between milestones. The diagram shows the new method using the same object to generate different milestones and actions. The flow of time is shown with a color code, with darker blue representing further frames.

In terms of methods for generating digital human movements, existing synthesis methods can be roughly divided into online generation and offline generation. Most online methods focus on real-time control of the character. Given a target object, they typically use autoregressive models to cyclically generate future motion through feedback predictions. Although this method has been widely used in interactive scenarios such as video games, its quality is still unsatisfactory for long-term generation.

Digital people light the main torch of the Asian Games, and this ICCV paper reveals Ants generative AI black technology

In order to improve the quality of motion, some recent offline methods have begun to adopt a multi-level framework, first generating trajectories and then synthesizing motion. Although this strategy can produce reasonable paths, the diversity of paths is limited

In this new study, the authors propose a new offline method for synthesizing long-term and diverse Interaction between people and objects. The innovation of this method lies in the hierarchical generation strategy. First, the strategy predicts a series of milestones and then generates human actions between those milestones

Specifically, given a starting position and a target object, the author designed a milestone generation module to synthesize a set of nodes along the movement trajectory. Each milestone encodes the local pose and indicates the transition during human movement. point. Based on these milestones, the algorithm employs a motion generation module to generate complete motion sequences. Thanks to the existence of these milestones, we can simplify the generation of long sequences to the synthesis of several short motion sequences.

The local pose of each milestone is generated by a transformer model that considers global dependencies to produce time-consistent results, further facilitating coherent motion

In addition to the hierarchical generation framework, The researchers further used diffusion models to synthesize human-object interactions. Some previous motion synthetic diffusion models combined transformers and denoising diffusion probabilistic models (DDPM).

It is worth mentioning that due to the long motion sequences, applying them directly to the new settings requires a lot of calculations and may cause GPU memory explosion. Because the new hierarchical generation framework converts long-term generation into the synthesis of multiple short sequences, the GPU memory required is reduced to the same level as short-term motion generation.

Therefore, researchers can effectively use Transformer DDPM to synthesize long-term motion sequences, thereby improving the generation quality

To achieve this purpose, researchers designed a hierarchical generation framework, as shown in the figure below Show

Digital people light the main torch of the Asian Games, and this ICCV paper reveals Ants generative AI black technology

First, they use GoalNet to predict interaction targets on objects, and then generate target poses to explicitly model human-object interactions. Next, they use the milestone generation module to estimate the length of the milestone, thereby generating the milestone trajectory from the starting point to the target, and place the milestone pose

In this way, the long-distance motion generation is decomposed into multiple short-distance Motion generated combinations. Finally, the authors designed a motion generation module to synthesize trajectories between milestones and fill in actions.

Artificial Intelligence (AI) Posture Generation

Researchers refer to the posture in which a person interacts with an object and remains stationary as the target posture. Previously, most methods used cVAE models to generate human poses, but researchers found that this method performed poorly in their own studies.

To address this challenge, they adopted the VQ-VAE model to model the data distribution. This model utilizes discrete representation to cluster data into a limited set of points. Furthermore, based on observations, different human poses may have similar properties. For example, when a person is sitting down, the hand movements may be different, but the leg position may be the same. Therefore, they divided the joints into L (L = 5) different non-overlapping groups

As shown in Figure 3, the target pose was divided into independent joint groups

Digital people light the main torch of the Asian Games, and this ICCV paper reveals Ants generative AI black technology

Based on the starting pose and target pose, we can let the algorithm generate the milestone trajectory and synthesize the local pose at the milestone. Since the length of the motion data is unknown and can be arbitrary (for example, a person may quickly walk to the chair and sit down, or he may walk slowly around the chair and then sit down), it is necessary to predict the length of the milestone, represented by N . Then, N landmark points are synthesized and local poses are placed on these points.

Digital people light the main torch of the Asian Games, and this ICCV paper reveals Ants generative AI black technology

The last step is action generation. The method used by the researchers is not to predict actions frame by frame, but to synthesize the entire sequence hierarchically based on the generated milestones. They first generate trajectories and then synthesize actions. Specifically, within two consecutive milestones, they complete the trajectory first. Then, fill in the movement guided by successive milestone gestures. These two steps are completed using two Transformer DDPM respectively.

The researcher will carefully design the conditions of DDPM for each step to generate the target output

The rewritten content is: the effect of being ahead of other products

The researchers compared the results of different methods on the SAMP dataset. It can be seen that the method proposed in the paper has lower FD, higher user research score and higher APD. Furthermore, their method achieves higher trajectory diversity than SAMP.

Digital people light the main torch of the Asian Games, and this ICCV paper reveals Ants generative AI black technology

This new method can produce satisfactory results in complex scenes. The percentage of penetration frames generated by this method is 3.8%, and that of SAMP is 4.9%

Digital people light the main torch of the Asian Games, and this ICCV paper reveals Ants generative AI black technology

On SAMP, COUCH and other data sets, the methods mentioned in the study have achieved Better results than baseline methods

Digital people light the main torch of the Asian Games, and this ICCV paper reveals Ants generative AI black technology

Digital people light the main torch of the Asian Games, and this ICCV paper reveals Ants generative AI black technology

Complete full-link layout

Digital human is a multi-modal combination of voice, semantics, vision, etc. A combination of dynamic technologies. While generative AI has recently made breakthroughs, the field of digital humans is experiencing leapfrog development. The modeling, generation interaction, rendering and other aspects that previously required manual work are now being fully artificialized. As engineers continue to Optimization, the experience of this technology on the mobile terminal is also getting better. The just-concluded online Asian Games torch relay event is a good example: if we want to become a torch bearer, we only need to click on the mini program of the Alipay App.

It is said that in order to ensure the smooth progress of the opening ceremony project, Ant Group’s engineers conducted more than 100,000 tests on hundreds of different models of mobile phones, typed more than 200,000 lines of code, and passed self-research The combination of Web3D interactive engine Galacean, AI digital human, cloud services, blockchain and other technologies ensures that everyone can become a digital torchbearer and participate in the torch relay. The Asian Games Digital Torchbearer Platform can reach hundreds of millions of users and supports 97% of common smartphone devices.

In order to allow digital torchbearers to participate realistically, Ant’s technical team developed 58 face-pinching controllers. By using facial recognition and AI algorithms, they can map a digital torchbearer's face based on each person's facial features. At the same time, users can also freely adjust face shape, hairstyle, nose, mouth, eyebrows and other features to achieve free dress-up. This technology can provide 2 trillion different digital image choices

In addition, after the opening ceremony lighting ceremony, each digital torch bearer can receive an exclusive digital ignition certificate with each digital torch painted on it. With a unique image of your hand, this certificate will be stored on the blockchain through distributed technology.

Digital people light the main torch of the Asian Games, and this ICCV paper reveals Ants generative AI black technologyIt is easy to see from the content of the research paper and the Asian Games projects that there is support from a complete digital human technology system behind it. It is understood that Ant Group is actively exploring digital human technology and has completed the self-research layout of the full-link core technology of digital human.

Unlike most companies on the market, Ant Group’s digital human technology is self-developed and has chosen a development direction that is combined with generative AI. In terms of technical deployment, it covers the entire life cycle of digital human modeling, rendering, driving, and interaction. Combining AIGC and large models significantly reduces the full-link production cost of digital humans. Currently, it can support 2D and 3D digital people, and provides a variety of solutions such as broadcast type and interactive type.

Digital people light the main torch of the Asian Games, and this ICCV paper reveals Ants generative AI black technologyAccording to public information, it can be summarized that the Ant Digital Human Platform currently has four technical advantages and features:

    Low-cost modeling : Cooperating with Tsinghua University to launch a 3D parametric model of Asian faces, which reconstructs 3D faces based on photos, which is more in line with the characteristics of Asian faces.
  • Generative driver: The combination of driver generation and motion capture effectively reduces costs and improves the richness of movements compared to the traditional action production process.
  • Highly adaptable rendering: self-developed Web3D rendering engine Galacean, covering 97% of common mobile phone terminals; in terms of neural rendering, a NeRF framework that decouples dynamic driving and static modeling has been built, and applications in digital human dynamic video scenes.
  • Intelligent interaction: Based on pre-trained timbre cloning, it supports minute-level audio input to generate personalized digital human timbre; and layouts digital human interaction based on large models.
  • Before the opening ceremony of the Asian Games, the China Academy of Information and Communications Technology released the latest compliance verification results of digital human standards. Ant Group’s Lingjing Digital Human Platform became the first product in the industry to pass the financial digital human evaluation. Obtained the highest rating of "Excellent Level (L4)".

In addition to the Asian Games, the Ant Digital People Platform also supports Ant Group’s Alipay, digital finance, government affairs, Wufu and other businesses, and this year began to apply it to short videos, live broadcasts, mini programs and other carriers to partners Provide basic services.

It can be predicted that in the near future, as digital humans powered by generative AI continue to upgrade, we will also experience better interactions in more scenarios, and truly enter a smart life integrating digital and real things.

The above is the detailed content of Digital people light the main torch of the Asian Games, and this ICCV paper reveals Ant's generative AI black technology. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

Repo: How To Revive Teammates
1 months ago By 尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
2 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
Hello Kitty Island Adventure: How To Get Giant Seeds
1 months ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

DeepMind robot plays table tennis, and its forehand and backhand slip into the air, completely defeating human beginners DeepMind robot plays table tennis, and its forehand and backhand slip into the air, completely defeating human beginners Aug 09, 2024 pm 04:01 PM

But maybe he can’t defeat the old man in the park? The Paris Olympic Games are in full swing, and table tennis has attracted much attention. At the same time, robots have also made new breakthroughs in playing table tennis. Just now, DeepMind proposed the first learning robot agent that can reach the level of human amateur players in competitive table tennis. Paper address: https://arxiv.org/pdf/2408.03906 How good is the DeepMind robot at playing table tennis? Probably on par with human amateur players: both forehand and backhand: the opponent uses a variety of playing styles, and the robot can also withstand: receiving serves with different spins: However, the intensity of the game does not seem to be as intense as the old man in the park. For robots, table tennis

Claude has become lazy too! Netizen: Learn to give yourself a holiday Claude has become lazy too! Netizen: Learn to give yourself a holiday Sep 02, 2024 pm 01:56 PM

The start of school is about to begin, and it’s not just the students who are about to start the new semester who should take care of themselves, but also the large AI models. Some time ago, Reddit was filled with netizens complaining that Claude was getting lazy. "Its level has dropped a lot, it often pauses, and even the output becomes very short. In the first week of release, it could translate a full 4-page document at once, but now it can't even output half a page!" https:// www.reddit.com/r/ClaudeAI/comments/1by8rw8/something_just_feels_wrong_with_claude_in_the/ in a post titled "Totally disappointed with Claude", full of

The first mechanical claw! Yuanluobao appeared at the 2024 World Robot Conference and released the first chess robot that can enter the home The first mechanical claw! Yuanluobao appeared at the 2024 World Robot Conference and released the first chess robot that can enter the home Aug 21, 2024 pm 07:33 PM

On August 21, the 2024 World Robot Conference was grandly held in Beijing. SenseTime's home robot brand "Yuanluobot SenseRobot" has unveiled its entire family of products, and recently released the Yuanluobot AI chess-playing robot - Chess Professional Edition (hereinafter referred to as "Yuanluobot SenseRobot"), becoming the world's first A chess robot for the home. As the third chess-playing robot product of Yuanluobo, the new Guoxiang robot has undergone a large number of special technical upgrades and innovations in AI and engineering machinery. For the first time, it has realized the ability to pick up three-dimensional chess pieces through mechanical claws on a home robot, and perform human-machine Functions such as chess playing, everyone playing chess, notation review, etc.

At the World Robot Conference, this domestic robot carrying 'the hope of future elderly care' was surrounded At the World Robot Conference, this domestic robot carrying 'the hope of future elderly care' was surrounded Aug 22, 2024 pm 10:35 PM

At the World Robot Conference being held in Beijing, the display of humanoid robots has become the absolute focus of the scene. At the Stardust Intelligent booth, the AI ​​robot assistant S1 performed three major performances of dulcimer, martial arts, and calligraphy in one exhibition area, capable of both literary and martial arts. , attracted a large number of professional audiences and media. The elegant playing on the elastic strings allows the S1 to demonstrate fine operation and absolute control with speed, strength and precision. CCTV News conducted a special report on the imitation learning and intelligent control behind "Calligraphy". Company founder Lai Jie explained that behind the silky movements, the hardware side pursues the best force control and the most human-like body indicators (speed, load) etc.), but on the AI ​​side, the real movement data of people is collected, allowing the robot to become stronger when it encounters a strong situation and learn to evolve quickly. And agile

Li Feifei's team proposed ReKep to give robots spatial intelligence and integrate GPT-4o Li Feifei's team proposed ReKep to give robots spatial intelligence and integrate GPT-4o Sep 03, 2024 pm 05:18 PM

Deep integration of vision and robot learning. When two robot hands work together smoothly to fold clothes, pour tea, and pack shoes, coupled with the 1X humanoid robot NEO that has been making headlines recently, you may have a feeling: we seem to be entering the age of robots. In fact, these silky movements are the product of advanced robotic technology + exquisite frame design + multi-modal large models. We know that useful robots often require complex and exquisite interactions with the environment, and the environment can be represented as constraints in the spatial and temporal domains. For example, if you want a robot to pour tea, the robot first needs to grasp the handle of the teapot and keep it upright without spilling the tea, then move it smoothly until the mouth of the pot is aligned with the mouth of the cup, and then tilt the teapot at a certain angle. . this

ACL 2024 Awards Announced: One of the Best Papers on Oracle Deciphering by HuaTech, GloVe Time Test Award ACL 2024 Awards Announced: One of the Best Papers on Oracle Deciphering by HuaTech, GloVe Time Test Award Aug 15, 2024 pm 04:37 PM

At this ACL conference, contributors have gained a lot. The six-day ACL2024 is being held in Bangkok, Thailand. ACL is the top international conference in the field of computational linguistics and natural language processing. It is organized by the International Association for Computational Linguistics and is held annually. ACL has always ranked first in academic influence in the field of NLP, and it is also a CCF-A recommended conference. This year's ACL conference is the 62nd and has received more than 400 cutting-edge works in the field of NLP. Yesterday afternoon, the conference announced the best paper and other awards. This time, there are 7 Best Paper Awards (two unpublished), 1 Best Theme Paper Award, and 35 Outstanding Paper Awards. The conference also awarded 3 Resource Paper Awards (ResourceAward) and Social Impact Award (

Hongmeng Smart Travel S9 and full-scenario new product launch conference, a number of blockbuster new products were released together Hongmeng Smart Travel S9 and full-scenario new product launch conference, a number of blockbuster new products were released together Aug 08, 2024 am 07:02 AM

This afternoon, Hongmeng Zhixing officially welcomed new brands and new cars. On August 6, Huawei held the Hongmeng Smart Xingxing S9 and Huawei full-scenario new product launch conference, bringing the panoramic smart flagship sedan Xiangjie S9, the new M7Pro and Huawei novaFlip, MatePad Pro 12.2 inches, the new MatePad Air, Huawei Bisheng With many new all-scenario smart products including the laser printer X1 series, FreeBuds6i, WATCHFIT3 and smart screen S5Pro, from smart travel, smart office to smart wear, Huawei continues to build a full-scenario smart ecosystem to bring consumers a smart experience of the Internet of Everything. Hongmeng Zhixing: In-depth empowerment to promote the upgrading of the smart car industry Huawei joins hands with Chinese automotive industry partners to provide

The first large UI model in China is released! Motiff's large model creates the best assistant for designers and optimizes UI design workflow The first large UI model in China is released! Motiff's large model creates the best assistant for designers and optimizes UI design workflow Aug 19, 2024 pm 04:48 PM

Artificial intelligence is developing faster than you might imagine. Since GPT-4 introduced multimodal technology into the public eye, multimodal large models have entered a stage of rapid development, gradually shifting from pure model research and development to exploration and application in vertical fields, and are deeply integrated with all walks of life. In the field of interface interaction, international technology giants such as Google and Apple have invested in the research and development of large multi-modal UI models, which is regarded as the only way forward for the mobile phone AI revolution. In this context, the first large-scale UI model in China was born. On August 17, at the IXDC2024 International Experience Design Conference, Motiff, a design tool in the AI ​​era, launched its independently developed UI multi-modal model - Motiff Model. This is the world's first UI design tool

See all articles