Tencent's Hunyuan large model has been upgraded again, with shocking release of Vincentian graph capabilities and comprehensive actual measurement and analysis-AI-php.cn

Home

Tencent's Hunyuan large model has been upgraded again, with shocking release of Vincentian graph capabilities and comprehensive actual measurement and analysis

王林

Oct 26, 2023 pm 09:13 PM

industry Hunyuan large model Vincent diagram large model

In 2023, the accelerator button will be pressed for the launch of large models, and Vincentian graphics will be one of the hottest application directions.

Since the birth of Stable Diffusion, large models of Wenshengtu have been emerging at home and abroad, and it felt like "fighting between gods" for a while. Each technology iteration brings rapid improvements in model generation effects and speed.

Just today, Tencent Hunyuan Model also announced the latest progress: Vincentian graph capability is officially launched.

#As soon as we tried it out, we saw Hunyuan Model’s understanding of the broad and profound Chinese food culture. Here I chose the "ant climbing the tree" that makes many large models difficult, but the Hunyuan is easily generated:

Tencents Hunyuan large model has been upgraded again, with shocking release of Vincentian graph capabilities and comprehensive actual measurement and analysis

The question is, the current Wenshengtu large model is so large, does the Hunyuan large model have any other special advantages?

According to the official introduction, in terms of algorithms and models, the current Vincentian large model still has some challenges, such as insufficient semantic understanding, unreasonable image structure, Problems such as insufficient picture details and low quality.

#Tencent has long begun to explore AI-generated images in advertising scenarios, and the relevant accumulation is quite profound. This Hunyuan large model upgrade’s Wenshengtu capability precisely hopes to solve the three problems of “semantics, content, and texture”.

According to reports, compared with other large models, Tencent Hunyuan’s Wen Sheng Tu has obvious advantages in the realism of portraits and scenes. At the same time, in the Chinese landscape It has good performance in generating scenes such as animation and games.

Hands-on test: Hunyuan Wensheng Tu, what’s the difference?

# To do a good job in "Wen Sheng Tu", a full understanding of "Wen" is crucial.

In terms of semantic understanding, the Hunyuan Wensheng graph model adopts a Chinese and English bilingual fine-grained model, and at the same time realizes bilingualism based on Chinese and English bilingual modeling Understand, and improve the model's ability to perceive details and generate effects through optimization algorithms.

Prior to this, although popular models like Stable Diffusion supported Chinese to a certain extent, their core data set LAION-5B was still mainly Westernized content, which was I don’t understand enough about Chinese language, food, culture, and customs.

The Hunyuan Wenshengtu model is a native Chinese Wenshengtu model. Regardless of the Chinese poems or idioms input by the user, the user can be directly asked to create paintings.

In terms of content rationality, Hunyuanwenshengtu enhances the image two-dimensional space position perception ability of the algorithm model and integrates the human skeleton and human hands Prior information such as structure is introduced into the generation process to make the generated image structure more reasonable and improve the problem of unreasonable human structure and hands generated by AI.

In terms of picture texture, Hunyuanwenshengtu is based on a multi-model fusion method to improve the generated texture. After optimization, the portrait model (hair, wrinkles, etc.) effect of Hunyuan Wenshengtu has been improved by 30%, and the scene model (vegetation, ripples, etc.) effect has been improved by 25%.

#The technical advantages in these three aspects have obviously improved the Hunyuan large model Wenshengtu product experience.

#In order to verify the above capabilities, this website set some questions and conducted a thorough test on the Hunyuan large model at the first time.

Since Hunyuan is a native Chinese model, it naturally understands "ancient Chinese language" better than other similar products. We first let it draw based on ancient poems.

We selected a very artistic ancient poem "When you are drunk, you don't know the sky is in the water, and the boat is full of clear dreams and the stars are overwhelming" to test to see if the Hunyuan large model can generate extreme Picture-like pictures.

Tencents Hunyuan large model has been upgraded again, with shocking release of Vincentian graph capabilities and comprehensive actual measurement and analysis

In the poem "Boat at Guazhou", the line "The spring breeze turns green again on the south bank of the river, when will the bright moon shine back on me?" writes the homesickness of countless wanderers. . As a result of the generation of Hunyuan, images such as "spring light", "water bank", and "bright moon" are extracted and combined organically, making people feel like they are in a poetic scene after seeing it:

Then comes the interesting "Chinese Food Painting" session. Let's take a classic test on "Shredded Pork with Fish Flavor":

From the Chinese food paintings that make people go crazy, to the current level of eating just by looking at the pictures, we can also feel the continuous evolution of Vincentian painting technology.

Let’s take a look at how Hunyuan does on the industry-recognized problem of “realistic portraits”:

We know that Midjourney became popular in the first place because of the photo of the couple below, which people can’t tell was not generated by AI.

## , let’s examine the ability of the Hunyuan large model to generate “cheating”. The prompt used is:

How do you feel about the realism? In our opinion, the details mentioned in Prompt are sufficient.

This is what Tencent emphasizes: the Hunyuan large model improves the perception of details and the generation effect through optimization algorithms. This ability can only be reflected in many specific scenes. Tencents Hunyuan large model has been upgraded again, with shocking release of Vincentian graph capabilities and comprehensive actual measurement and analysis

For example, in an animation scene, a deer is running in the forest, causing fallen leaves to fly up, the moon is very bright and big, and birds are flying in the sky, creating a sense of atmosphere. CG style, side view".

Does it look like the scene in the animation you watched when you were a kid?

In addition, in animation creation, the application potential of Vincentian diagrams is huge. Tencents Hunyuan large model has been upgraded again, with shocking release of Vincentian graph capabilities and comprehensive actual measurement and analysis

The prompt we gave to the Hunyuan large model is "Generate 3D, anime style, 1 girl, blond hair, smile, short hair, city background":

What do you think of the generation effect? Can it be used directly as wallpaper?

#What are the self-developed technologies behind Wenshengtu?

If a worker wants to do his job well, he must first sharpen his tools, and the same is true for large models.

We learned that in addition to innovative model algorithms, the Tencent Hunyuan large model can achieve such a Wensheng picture effect that is in line with the Chinese local atmosphere, and it is also inseparable from high-quality pictures. Text matching data, self-developed machine learning framework and powerful computing infrastructure.

Tencent Hunyuan Large Model has formed a full-link self-developed technology path from model algorithm to machine learning framework to AI infrastructure. Multi-level technological accumulation means that the evolution of large models requires one step at a time, starting from practice and improving in practice.

First let’s look at the data engineering that supports model training.

# For any AI, especially large models, data is one of the three indispensable elements. The same is true for the large-model text generation function. Image and text data, especially the matching data between images and texts, has a decisive impact on the generation effect.

However, not all existing data on the Internet can be used immediately. The big problem is that the text description of the picture may not be accurate, which leads to a large number of problems. The quality of most image-text matching data is relatively poor. If used, even if the training time is very long, the model generation effect will still not meet expectations, which will also affect the stability of the generation quality and subsequent iteration efficiency.

# Therefore, improving the quality of image and text data has become the "first hurdle" to ensure the effect of Vincentian images. At this time, it is often necessary to improve data quality through engineering methods, support model training, optimization and upgrade, and build a moat for the algorithm model.

Faced with the problem of image and text matching data, the response strategy of Tencent Hunyuanwenshengtu team is as follows: first, refine the Chinese prompts in a fine-grained manner to improve the correlation between images and texts. Maximize data quality; then adopt a strategy of layering and grading training data to gradually optimize the model and maximize data effects; and finally build a data flywheel, which is the key to rapid iteration of large models. Based on feedback from online users using large models, the team automatically builds training data to speed up model iteration and maximize data efficiency.

#The data quality, effect and efficiency have been improved, which lays the foundation for a good Vincent chart effect. The machine learning framework to be discussed next is equally important.

A powerful machine learning framework or platform will greatly improve the speed and efficiency of developers in building, training and deploying models. Tencent has developed its own Angel machine learning platform for large model training and inference scenarios, which mainly includes AngelPTM for training and AngelHCF for inference.

AngelPTM adopts the ZeRO-Cache optimization strategy and becomes a powerful tool for super-large model training. It expands the capacity of single-machine models through storage management, improves resource utilization through multi-stream asynchronously, and uses video memory to Management improves memory efficiency. In addition, 4D parallelism is used to increase the upper limit of available video memory, reduce communication pressure on kilocards, and release computing potential. The automatic training renewal mechanism supports automatic fault tolerance of kilocard failures and reduces interruption time. The model training situation is also monitored in real time, and the collaborative algorithm optimizes the model training direction.

Currently, AngelPTM realizes high-speed training of hundreds of billions of mixed element base models in parallel based on the industry's first ZeRO-Cache mechanism 4D. The training speed is compared to the mainstream open source framework (DeepSpeed -Chat) increased by 1 times.

^{ZeRO-Cache Overview.}

AngelHCF mainly customizes diversified service strategies, parallel strategies, framework acceleration (covering common GPU acceleration methods), and model compression (supports commonly used compression in the industry Methods) and efficient model debugging capabilities at five levels to improve the reasoning performance of large models. The inference speed is 1.3 times higher than that of the industry's mainstream framework (FasterTransformer).

Tencent said that its Angel machine learning platform has leading performance and can help provide a better infrastructure system and help large models run at high speed. This allows the Hunyuan large model to generate high-quality images while also greatly improving the generation speed.

With high-quality data and efficient machine learning framework, the continuous operation of large models still faces the test of computing power. After all, in the era of large models, computing power is king.

The function of Tencent Hunyuan Wenshengtu is inseparable from the powerful computing infrastructure provided by Tencent Cloud. In April 2023, Tencent Cloud released a new generation of HCC high-performance computing cluster, using the latest generation of Xinghai self-developed servers, and based on self-developed network and storage architecture, achieving 3.2T ultra-high interconnect bandwidth, TB-level throughput capacity and 10 million level IOPS. The computing power performance of the new generation cluster is improved by 3 times compared with the previous generation and more than 12 times compared with the traditional computing cluster solution.

# While strengthening the underlying hardware, the upper-layer software capabilities must also go hand in hand. The new generation HCC cluster integrates Tencent Cloud's self-developed TACO training acceleration engine and has made a lot of system-level optimizations from the network protocol, communication strategy, AI framework, and model compilation levels. This comprehensive set of ecological training acceleration solutions can not only help customers lower the AI optimization threshold and improve AI training performance, but also greatly reduce training tuning and computing power costs.

It seems that the three major factors that restrict large models, algorithm, data and computing power, are no longer a problem in Tencent Hunyuan large model. Naturally, the quality and effect of Vincentian drawings are also guaranteed.

The effect is "false and real",

The ability of Wenshengtu has been embedded in Tencent advertising scenes

The Hunyuan large model Wenshengtu ability we saw today was not achieved overnight, but a real process of evolution.

At the 2023 Tencent Global Digital Ecosystem Conference held last month, Tencent’s Hunyuan large model was officially unveiled. Jiang Jie, vice president of Tencent Group, said at the time that Hunyuan is always on the road. Tencent will continue to evolve Hunyuan’s capabilities and hopes to bring surprises to everyone every month.

Currently, Tencent has 180 internal businesses connected to the Hunyuan large model, including Tencent Conference, Tencent Documents, Enterprise WeChat, Tencent Advertising and WeChat Search. . At the same time, customers from multiple industries such as retail, education, finance, medical care, media, transportation, government affairs, etc. also call Tencent Hunyuan API through Tencent Cloud. The application areas include intelligent question and answer, content creation, data analysis, code assistant and other scenarios.

The newly opened Vincentian graph capability is the biggest surprise that Tencent’s Hunyuan model brings to us, demonstrating its leading capabilities in the field of automatic image generation. Of course, Tencent Hunyuan Wenshengtu is also gradually evolving, and more Wenshengtu related and Wenshengtu functions will be developed in the future. We can look forward to a wave of it.

Currently, Hunyuanwen’s image-generating capabilities have been embedded in Tencent’s advertising scenarios, such as generating product advertisements or advertising images. In multiple rounds of evaluations under the advertising business, the case excellence rate and advertiser adoption rate of Tencent Hunyuan Wenshengtu reached 86% and 26% respectively, which are both higher than similar models.

# Let’s first look at the following example, which requires the Hunyuan large model to generate a hotel room. Judging from the effects, the Hunyuan Wensheng picture effect is obviously better after the upgrade, the design and quality are greatly improved, and the details are richer. Even comparing it to Midjourney, the results are comparable.

The character class generation scene has a similar effect. After the upgrade, the portraits generated by Hunyuan are more realistic, such as facial skin color, wrinkles and other details.

In addition to advertising scenes, Tencent is also constantly exploring other demand scenarios for Wenshengtu, such as generating game elements and game characters in game scenes, and generating novel accessories in content scenes. Pictures, illustrations, cloud business scenarios open hybrid capabilities to customers in different industries.

No matter how powerful the model is, it must be used by more people and continue to receive feedback, so that it can make further progress.

It can be foreseen that Tencent products will usher in an explosion of Hunyuan Wenshengtu capabilities in the future, and users will also experience more of the charm brought by AIGC.

The above is the detailed content of Tencent's Hunyuan large model has been upgraded again, with shocking release of Vincentian graph capabilities and comprehensive actual measurement and analysis. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)

4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

R.E.P.O. Best Graphic Settings

4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Assassin's Creed Shadows: Seashell Riddle Solution

2 weeks ago By DDD

R.E.P.O. How to Fix Audio if You Can't Hear Anyone

4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

WWE 2K25: How To Unlock Everything In MyRise

1 months ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Where is the login entrance for gmail email?

7516

CakePHP Tutorial

1378

What is the format of the account name of steam

win11 activation key permanent

nyt connections hints and answers

Related knowledge

DeepMind robot plays table tennis, and its forehand and backhand slip into the air, completely defeating human beginners Aug 09, 2024 pm 04:01 PM

But maybe he can’t defeat the old man in the park? The Paris Olympic Games are in full swing, and table tennis has attracted much attention. At the same time, robots have also made new breakthroughs in playing table tennis. Just now, DeepMind proposed the first learning robot agent that can reach the level of human amateur players in competitive table tennis. Paper address: https://arxiv.org/pdf/2408.03906 How good is the DeepMind robot at playing table tennis? Probably on par with human amateur players: both forehand and backhand: the opponent uses a variety of playing styles, and the robot can also withstand: receiving serves with different spins: However, the intensity of the game does not seem to be as intense as the old man in the park. For robots, table tennis

The first mechanical claw! Yuanluobao appeared at the 2024 World Robot Conference and released the first chess robot that can enter the home Aug 21, 2024 pm 07:33 PM

On August 21, the 2024 World Robot Conference was grandly held in Beijing. SenseTime's home robot brand "Yuanluobot SenseRobot" has unveiled its entire family of products, and recently released the Yuanluobot AI chess-playing robot - Chess Professional Edition (hereinafter referred to as "Yuanluobot SenseRobot"), becoming the world's first A chess robot for the home. As the third chess-playing robot product of Yuanluobo, the new Guoxiang robot has undergone a large number of special technical upgrades and innovations in AI and engineering machinery. For the first time, it has realized the ability to pick up three-dimensional chess pieces through mechanical claws on a home robot, and perform human-machine Functions such as chess playing, everyone playing chess, notation review, etc.

Claude has become lazy too! Netizen: Learn to give yourself a holiday Sep 02, 2024 pm 01:56 PM

The start of school is about to begin, and it’s not just the students who are about to start the new semester who should take care of themselves, but also the large AI models. Some time ago, Reddit was filled with netizens complaining that Claude was getting lazy. "Its level has dropped a lot, it often pauses, and even the output becomes very short. In the first week of release, it could translate a full 4-page document at once, but now it can't even output half a page!" https:// www.reddit.com/r/ClaudeAI/comments/1by8rw8/something_just_feels_wrong_with_claude_in_the/ in a post titled "Totally disappointed with Claude", full of

At the World Robot Conference, this domestic robot carrying 'the hope of future elderly care' was surrounded Aug 22, 2024 pm 10:35 PM

At the World Robot Conference being held in Beijing, the display of humanoid robots has become the absolute focus of the scene. At the Stardust Intelligent booth, the AI robot assistant S1 performed three major performances of dulcimer, martial arts, and calligraphy in one exhibition area, capable of both literary and martial arts. , attracted a large number of professional audiences and media. The elegant playing on the elastic strings allows the S1 to demonstrate fine operation and absolute control with speed, strength and precision. CCTV News conducted a special report on the imitation learning and intelligent control behind "Calligraphy". Company founder Lai Jie explained that behind the silky movements, the hardware side pursues the best force control and the most human-like body indicators (speed, load) etc.), but on the AI side, the real movement data of people is collected, allowing the robot to become stronger when it encounters a strong situation and learn to evolve quickly. And agile

ACL 2024 Awards Announced: One of the Best Papers on Oracle Deciphering by HuaTech, GloVe Time Test Award Aug 15, 2024 pm 04:37 PM

At this ACL conference, contributors have gained a lot. The six-day ACL2024 is being held in Bangkok, Thailand. ACL is the top international conference in the field of computational linguistics and natural language processing. It is organized by the International Association for Computational Linguistics and is held annually. ACL has always ranked first in academic influence in the field of NLP, and it is also a CCF-A recommended conference. This year's ACL conference is the 62nd and has received more than 400 cutting-edge works in the field of NLP. Yesterday afternoon, the conference announced the best paper and other awards. This time, there are 7 Best Paper Awards (two unpublished), 1 Best Theme Paper Award, and 35 Outstanding Paper Awards. The conference also awarded 3 Resource Paper Awards (ResourceAward) and Social Impact Award (

Hongmeng Smart Travel S9 and full-scenario new product launch conference, a number of blockbuster new products were released together Aug 08, 2024 am 07:02 AM

This afternoon, Hongmeng Zhixing officially welcomed new brands and new cars. On August 6, Huawei held the Hongmeng Smart Xingxing S9 and Huawei full-scenario new product launch conference, bringing the panoramic smart flagship sedan Xiangjie S9, the new M7Pro and Huawei novaFlip, MatePad Pro 12.2 inches, the new MatePad Air, Huawei Bisheng With many new all-scenario smart products including the laser printer X1 series, FreeBuds6i, WATCHFIT3 and smart screen S5Pro, from smart travel, smart office to smart wear, Huawei continues to build a full-scenario smart ecosystem to bring consumers a smart experience of the Internet of Everything. Hongmeng Zhixing: In-depth empowerment to promote the upgrading of the smart car industry Huawei joins hands with Chinese automotive industry partners to provide

Li Feifei's team proposed ReKep to give robots spatial intelligence and integrate GPT-4o Sep 03, 2024 pm 05:18 PM

Deep integration of vision and robot learning. When two robot hands work together smoothly to fold clothes, pour tea, and pack shoes, coupled with the 1X humanoid robot NEO that has been making headlines recently, you may have a feeling: we seem to be entering the age of robots. In fact, these silky movements are the product of advanced robotic technology + exquisite frame design + multi-modal large models. We know that useful robots often require complex and exquisite interactions with the environment, and the environment can be represented as constraints in the spatial and temporal domains. For example, if you want a robot to pour tea, the robot first needs to grasp the handle of the teapot and keep it upright without spilling the tea, then move it smoothly until the mouth of the pot is aligned with the mouth of the cup, and then tilt the teapot at a certain angle. . this

Distributed Artificial Intelligence Conference DAI 2024 Call for Papers: Agent Day, Richard Sutton, the father of reinforcement learning, will attend! Yan Shuicheng, Sergey Levine and DeepMind scientists will give keynote speeches Aug 22, 2024 pm 08:02 PM

Conference Introduction With the rapid development of science and technology, artificial intelligence has become an important force in promoting social progress. In this era, we are fortunate to witness and participate in the innovation and application of Distributed Artificial Intelligence (DAI). Distributed artificial intelligence is an important branch of the field of artificial intelligence, which has attracted more and more attention in recent years. Agents based on large language models (LLM) have suddenly emerged. By combining the powerful language understanding and generation capabilities of large models, they have shown great potential in natural language interaction, knowledge reasoning, task planning, etc. AIAgent is taking over the big language model and has become a hot topic in the current AI circle. Au

See all articles