Home Technology peripherals AI My ears are right, the sound is too real, the Seed-TTS technology of Byte Beanbao speech synthesis is revealed

My ears are right, the sound is too real, the Seed-TTS technology of Byte Beanbao speech synthesis is revealed

Jun 26, 2024 pm 08:37 PM
ByteDance industry Bean bag model

Seed-TTS is a large speech generation model recently released by the ByteDance Doubao model team.

, the speech it generates is almost **no different** from real people, even pronunciation **defects** can be generated, especially in terms of learning to imitate human speech, **fidelity** and **fluency **all have **excellent** performance.

For example, provide a piece of speech to Seed-TTS, It can generate a new speech based on the text, and bring the sound characteristics of the original material.

Original material (Prompt): My ears are right, the sound is too real, the Seed-TTS technology of Byte Beanbao speech synthesis is revealedSeed-TTS generated Chinese voice: My ears are right, the sound is too real, the Seed-TTS technology of Byte Beanbao speech synthesis is revealed

Suddenly, there was laughter around me. I looked at them, straightened my chest with high spirits, shook my fleshy arms, and chuckled: "The flesh on my body is to cover up my overwhelming charm, otherwise, wouldn't I scare you all? ?”

English speech can also be generated and can still “reproduce” the characteristics of Chinese speakers.

Seed-TTS generated English speech: My ears are right, the sound is too real, the Seed-TTS technology of Byte Beanbao speech synthesis is revealed
Suddenly, there was a burst of laughter beside me. I looked at them, stood up straight with high spirit, shook the slightly fleshy arms, and smiled lightly, saying, "The flesh on my body is to hide my bursting charm. Otherwise, wouldn't it scare you?" Realize it and bring out the character's "feeling" in the voice:
Hey, do you also want to have a sweet love? "A Little Smile Is Lovely" is your best choice. The male and female protagonists are school beauties. They got to know each other through games, and then when they met, there was no misunderstanding in the whole process. It was so sweet that I couldn't help but say "Auntie" when I think about it. Laugh"~
Little fool, well... it's a very cute and friendly name, a bit "unique", but I'm a little curious, why did you choose this nickname for me? My ears are right, the sound is too real, the Seed-TTS technology of Byte Beanbao speech synthesis is revealedMy ears are right, the sound is too real, the Seed-TTS technology of Byte Beanbao speech synthesis is revealed
Not only can it generate a "single" voice,
Seed-TTS can even present a "storyteller" corresponding to the characters and emotions based on the plot of the novel and different character traits.

"Is this pill... a drug or an aphrodisiac or something like that? Why does my scent smell so similar to what the two sisters said? Well, don't you think... Are you plotting against me?" Han Li was stunned for a long time after hearing this. He suddenly felt like he was vomiting blood. This girl's thoughts were too elusive. She could associate Yingxiang Pills with aphrodisiacs. Alas, Han Li didn't know whether to admire the other party's caution or to scream three times because he had been wronged for no reason. "It seems like what you said is true. However, I still have to take it to my second sister for testing before using it. After all, our daughter's family must be careful." "Cough, cough, uh, it's up to you. " Han Li was speechless and could only cough a few times to cover up the embarrassment on his face. He now felt that he had better stay away from this little goblin, otherwise, he would be depressed to death by her at some point. "Humph, but if this medicine is as effective as you say, then you have passed the test! If senior brother has any difficulties in Mo Mansion from now on, you can come to Caihuan for help. I just need to collect some small As a reward, I will definitely be able to help you solve the problem completely. "Okay, junior sister, if my senior brother has something to do, I will definitely ask you for help." Han Li returned to his normal state and responded to this with a smile on his face, but in his heart. Then he thought viciously: "It's strange that I'm looking for a little money fan like you.”

For more demonstrations and principles, please see the original paper and effect display:
My ears are right, the sound is too real, the Seed-TTS technology of Byte Beanbao speech synthesis is revealed
  • Paper link: https://arxiv.org/abs/2406.02430
  • Effect display : https://bytedancespeech.github.io/seedtts_tech_report/

Before the release of the technical report, part of Seed-TTS technology has been online for a period of time in C-side products, and has received many real praises from users, and has been widely praised by the outside world. Speech synthesis model and beanbag sound reproduction model are provided for technical commercialization services.

Want to listen to the team’s sharing about the technical highlights, research value, and challenges overcome

Large model of speech generation base

Q: Seed-TTS has been noticed by some insiders. What kind of recognition impressed you?

A: There is a professor who works in speech recognition and later worked in a company. He is an industry insider that I admire very much. At an academic conference not long ago, we demonstrated the demo of Seed-TTS. After watching it, he gave feedback that he wanted to watch it recently. Looking at what can be done in the direction of speech generation, I feel that there is nothing to do in this area. Although I feel that there is still room for improvement, I am very happy after listening to it. Q: Why. Are you happy?

A:
It’s more likely that people say you’re doing well, but this professor was looking for related research topics at the time. During this period, he saw our results and gave us. Positive comments, and I feel that our results are already very good, and we need to find other questions. This is really a high recognition for us

Q: Compared with previous results, what is the difference between Seed-TTS?

A: It is a base model for speech generation, which is slightly different from most speech generation models. Specifically, the traditional TTS is a single-task model, but for the base model, we hope that it can do anything. Task, make any sound, and allow us to control many dimensions at the same time, such as dialects, real people's oral habits, and even phonetic defects such as word swallowing

As long as there are speech methods in the world, English and Japanese. , Chinese, and even dialects in various languages, such as Shaanxi dialect and Henan dialect in Chinese... Or happy, sad, crying, angry, as long as human beings exist, we all want it to come out
.
Q: Have all the above ideas been achieved
?

A: A large part of it has been achieved. Of course, there are some places where it cannot be done, but technology is always moving forward. For example, the current language model is a base, which has a deep understanding at the text level. We also hope to truly make it a "base"

Q: The challenge of making a "base model" is. Where?

A:
The first thing is that the detailed modeling is better.
In the past, TTS was easy to implement as a broadcasting system, but it sounded like a "machine sound". Modeling, and sounding like a human, requires a lot of detail. In particular, humans are very sensitive to their own sounds. Even if the meows of puppies and kittens are not natural, they may not be heard. However, there is a problem with human speech, which sounds very "mechanical".
Second, it requires high naturalness and high stability. Most of the mainstream TTS in the past two years were based on prior knowledge and duration models, which were defined for each phone, but limited expressiveness from the bottom. If you remove these, there will be stability and naturalness issues, which is another challenge.

The third is that the data coverage (Data Coverage) is very large. We want to replicate anyone’s voice and various language dialects, including replicating imperfections in human pronunciation, such as word swallowing and non-standard pronunciation. In order to reconstruct these features and restore "imperfections", the data coverage (Data Coverage) must be high. Previously, the data used in the industry were on the order of hundreds or thousands of hours, and there were models on the order of tens of thousands of hours. The data used by Seed-TTS was much larger than before. Such a large amount of data will also bring about the balance between quality and quantity, which is also a difficulty.

Fourth , model design. In such a large-scale situation, how to design a model to achieve better effects in all aspects is also a big challenge.

Finally, there’s the engineering challenge. As mentioned above, our data is large in magnitude and model complexity is high, which will naturally bring about engineering problems, which few people have solved before.

Q: From a technical perspective, what is the value of solving these challenges?

A Favoring text and images, speech has the attributes of both text and images. Which of the two is more suitable for speech modeling is a question we have to answer.
Speech and text have many similarities. How to design the representation of speech to make it more suitable for language model modeling is also a problem that needs to be solved.
How to use reinforcement learning to integrate various subjective and objective preference information into the generation system is also one of the problems.
  • There are many other highlights, including the stability issue of the autoregressive speech generation model. In addition, through this study, we are also trying to look at TTS issues from a perspective outside the TTS field.
Q: You mentioned research on language models and diffusion models. What conclusions can we draw from them?

A:
Seed-TTS not only provides a technical solution based on language model, but also provides another Diffusion technical solution that is completely separated from the duration model, which is also the first in the industry.
In addition, after extensive comparisons between the two systems, we found that the language model is relatively friendly for streaming processing, and the diffusion model is more suitable for editing processing. I believe that in the future, the two will continue to merge.
Q: For these two systems, what technical difficulties does Seed-TTS specifically solve?


A:
For language model systems, it mainly solves the Tokenizer and stability of speech.
For language model modeling, speech tokenization is a core part. Currently, there are both continuous and discrete Tokenizers on the market, and the team has conducted a lot of exploration. We found that the design of the information contained in the token has a very critical impact on the performance and stability of the entire model in all aspects. This includes not only the information of the token, frame rate, etc., but also how to tokenize it and how to turn it back into sound. Currently, these are not explored much in the industry.
In terms of the stability of the language model, we have made various explorations in token, model design, decoding strategy, and data preparation, and truly met the requirements of industry and applications.

For the pure Diffusion system, since the extra duration model is removed, the difficulty is also focused on stability. After many attempts, we have also achieved very good indicators on this link.

Q: Regarding "speech and text models have many similarities", what does this inspire us?


A:
From the perspective of large text models, speech generation models can also be divided into Pretrain, Instruct Fine-Tuning and Post Training.
Among them, Pretrain can improve the basic capabilities of the model, which is specifically reflected in the Incontext Learning capabilities, such as timbre continuation, voice cloning and other capabilities.
For Instruct Fine-Tuning, the main purpose is to use Instruct to make the speech generation process more controllable, just like the director and the actor making requests, speak faster or slower, how to impress people, these are all integrated by us Go in.

Finally, we also found that reinforcement learning can improve the model in many dimensions, integrating various subjective and objective preference information into the generation system, including stability, control, expressiveness, naturalness, etc. Not many people in the industry are exploring this aspect.

On the basis of the above, we also explored the method of using synthetic data for Self-Distillation, and also obtained very good benefits. This is relatively more used in text LLM, and has been relatively less explored in the speech industry before.

Q: You mentioned three times that “some issues are less explored in the industry”. What caused this phenomenon?

A:On the one hand, previous research in the field of speech generation was relatively independent, and there were many traditional experiences in the industry, which may no longer be applicable under this AIGC trend. From a broader perspective, speech generation has a lot in common with text and image generation. The rapid development of large text models and image generation has also brought us a lot of new thinking. Since it takes time to promote new ideas, there is still relatively little exploration in the industry.

On the other hand, many researchers work in schools and do not have relevant resources. There are a lot of systematic projects here. Not only can we do it, but we have also explored it in detail and found some models that can take into account stability, expressiveness and computational complexity. But is this the best we can do? May still need to continue to explore.

Q: Are there any milestone moments in the entire research process?

A: The basic effect was released last year. Since then, we have iterated a lot using real cases. The work includes: finding real cases, various Post Training, and solving implementation problems (such as various stability, first packet delay, number of concurrencies, amount of computation, etc.) in this scenario. Compared with then, the effect now has improved a lot.

Where has the large speech generation model gone?

Q: Looking back now, what is the value of the entire study?

A: From the perspective of the value of Seed-TTS itself, voice is not entirely a tool, but the most direct form of human interaction. For example, from silent movies to talkies, a small change is a huge leap in the industry. The emotional connection between people relies more on voice. For example, when a child calls daddy, the emotional connection it gives you is completely different from reading text.

If we want to move towards true AI, the naturalness of speech is a key component. In the past, the machines we imagined were all machine voices, such as Moss in "The Wandering Earth". If AI can really be like your assistant and partner, the emotional connection brought by voice is essential. Jarvis in "Iron Man" is remembered by many people because he was voiced by a real person.

In addition, in terms of applications, there are many scenarios for voice application, such as novels and e-books, character design, video translation, virtual characters, broadcasting, and actor expressions, all of which have their uses, including stuttering and inability to pronounce sounds. of people can still express themselves with the help of voice technology. As long as the voice scenario is not purely information media, there is room for application. This is also our motivation to make the base model good.

Q: Scaling law has been regarded as "faith" by some practitioners. For speech generation models, what is the result after we scale the data and model?

A: Even at a very large scale, we can always see benefits as we continue to scale up. In general, by increasing the magnitude of Scale, we are pleasantly surprised to see that the model continues to acquire new capabilities.

Q: According to your observations, where is this limit?

A: At present, we can still see benefits every time, and we definitely need to continue to explore. However, we have proven that with correct model design, we can break the traditional thinking of TTS. In the past, we relied on a small amount of high-quality data, but now we continue to increase the magnitude and can achieve higher benefits.

Q: What enlightenment does GPT4-o have for us?

A:It is a unified model for generation and understanding. It has higher requirements for speech technology and requires a model to have the ability to listen, speak and think at the same time. These put forward many new requirements for our work.

Q: What is the current development stage of large models in the speech field?

A:On the one hand, we hope that the model has the expression and control of a professional actor. Most of the time, the speech generated by the model is not much different from that of real people. However, in movies and TV dramas, actors express emotions very intensely, and the information density is relatively high, so they are not completely aligned. We all want to complete the Corner Case.

On the other hand is the handling of details, including Bad Case processing and optimization to solve uncommon long-tail situations.

Large model work requires the participation of a large number of outstanding talents

Q: In this release of Seed-TTS, colleagues from all over the world have participated. Why are so many people participating?

A:With the development of the industry, cooperation between multiple people is inevitable. To achieve the ultimate goal of a large model while meeting the needs of industrialization, it cannot be supported by 1-2 ideas, and many people must participate. All participants were very professional. For example, our data requires professional students to participate in processing. Another example is that the implementation process involves many details and requires the cooperation of students who specialize in evaluation and engineering support. They all made great contributions.

We can see that among the mainstream players in AI cutting-edge research, a project has a very large number of participants, and professional students are responsible for each link. Such high-density, high-complexity talent collaboration and precise coordination , the requirements for organizational skills are also very high.

Q: What is the team atmosphere in your opinion?

A: I think it’s because of the “drive” and “details”. "Importance" is reflected in everyone taking the initiative to do things. It was also a self-driven process in itself, born out of curiosity and the idea of ​​changing the industry. This atmosphere is more like that of a start-up company, with fewer large companies.

Q: You also mentioned that the team will "pick out details". How do you understand this?

A: This is about picking out details in real scenes. For generation work, it is easy to do a beautiful demo in demo, but in actual application, the system will face various detailed problems. In order to ensure that the model is always generated with high quality and meets user needs, we have very strict requirements on system stability and robustness, which requires repeated polishing to ensure that every detail is of high quality. On the contrary, for Demo, we didn’t do much optimization.

Q: Do we have any internal debate about "not doing too much demo optimization"?

A: Yes, especially young students. After all, everyone wants to show the better side, but we still hope to get results that can be implemented to prevent users from actually using it. During the process, I discovered that there was a big gap between the product and the demo, which truly changed the industry.

Q: Is the relevant technology currently applied in Doubao App?

A: Some related technologies have been used for a period of time. We will only display them to the outside world after being approved by users in real scenarios. Some technologies are also undergoing some final online work.

Q: What keywords can summarize our team?

A: The first one is professional. This is reflected in many aspects, including data, infrastructure, model design, etc. We will pay attention to the details of every link very professionally, and strive to achieve the ultimate performance from the perspective of industrial implementation.

The second word is focus and drive. In order to achieve our goals, focus and drive are indispensable. Therefore, everyone is very invested. When the results are actually achieved, everyone feels a sense of accomplishment and gains confidence.

The third word is unity. When working in a team, everyone has no sense of territoriality and the cooperation is very smooth. This makes me feel very comfortable, which is rare in large companies.

Q: What qualities does our team hope to continue to attract people to join?

A: First of all, look at whether the values ​​​​can be consistent. Ability is certainly one aspect, but more importantly, we hope to find partners who are in the same boat so that everyone can achieve self-realization. Cooperation under this kind of values ​​will naturally be smooth.

The second is the diversity of backgrounds. At present, the methods used in various fields of AI are similar, and everyone is gradually integrating in the same direction. Therefore, experience in reinforcement learning, visual recognition, audio recognition and other fields play a crucial role in generation.We hope that students from different professional backgrounds can participate. I am a speech understanding person and have switched to TTS.

Finally, subjective initiative and learning ability, and high pursuit of work. Generative tasks also have many unique features. We hope that candidates can find the combination of tasks and their own experience. Among them, active learning ability is necessary. At the same time, we hope to make the best technology and products in the industry. Students are also required to keep moving forward with this vision in mind every day.



The above is what the Seed-TTS team members shared. The team is still continuing to recruit outstanding talents.

If you also have ideals and enthusiasm for large model technology, and recognize the atmosphere of the Doubao Model team, please log in to the official website of the Doubao Model Team at team.doubao.com or follow the team’s official public account, Learn more about technical progress, team stories, and recruitment information: My ears are right, the sound is too real, the Seed-TTS technology of Byte Beanbao speech synthesis is revealed
ByteDance Top Seed Talent Plan is recruiting. We hope to continue to attract and recruit top talents with ambitious goals and ambitions to "change the world with technology." Join us and you will work with the best scientists and engineers to participate in the industry's top technical challenges and tackle difficult problems.

Welcome to press and hold the QR code below or click to read the original text and submit your resume.

My ears are right, the sound is too real, the Seed-TTS technology of Byte Beanbao speech synthesis is revealed

Click this link to submit your job with one click!

The above is the detailed content of My ears are right, the sound is too real, the Seed-TTS technology of Byte Beanbao speech synthesis is revealed. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
1 months ago By 尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. Best Graphic Settings
1 months ago By 尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. How to Fix Audio if You Can't Hear Anyone
1 months ago By 尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. Chat Commands and How to Use Them
1 months ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

DeepMind robot plays table tennis, and its forehand and backhand slip into the air, completely defeating human beginners DeepMind robot plays table tennis, and its forehand and backhand slip into the air, completely defeating human beginners Aug 09, 2024 pm 04:01 PM

But maybe he can’t defeat the old man in the park? The Paris Olympic Games are in full swing, and table tennis has attracted much attention. At the same time, robots have also made new breakthroughs in playing table tennis. Just now, DeepMind proposed the first learning robot agent that can reach the level of human amateur players in competitive table tennis. Paper address: https://arxiv.org/pdf/2408.03906 How good is the DeepMind robot at playing table tennis? Probably on par with human amateur players: both forehand and backhand: the opponent uses a variety of playing styles, and the robot can also withstand: receiving serves with different spins: However, the intensity of the game does not seem to be as intense as the old man in the park. For robots, table tennis

The first mechanical claw! Yuanluobao appeared at the 2024 World Robot Conference and released the first chess robot that can enter the home The first mechanical claw! Yuanluobao appeared at the 2024 World Robot Conference and released the first chess robot that can enter the home Aug 21, 2024 pm 07:33 PM

On August 21, the 2024 World Robot Conference was grandly held in Beijing. SenseTime's home robot brand "Yuanluobot SenseRobot" has unveiled its entire family of products, and recently released the Yuanluobot AI chess-playing robot - Chess Professional Edition (hereinafter referred to as "Yuanluobot SenseRobot"), becoming the world's first A chess robot for the home. As the third chess-playing robot product of Yuanluobo, the new Guoxiang robot has undergone a large number of special technical upgrades and innovations in AI and engineering machinery. For the first time, it has realized the ability to pick up three-dimensional chess pieces through mechanical claws on a home robot, and perform human-machine Functions such as chess playing, everyone playing chess, notation review, etc.

Claude has become lazy too! Netizen: Learn to give yourself a holiday Claude has become lazy too! Netizen: Learn to give yourself a holiday Sep 02, 2024 pm 01:56 PM

The start of school is about to begin, and it’s not just the students who are about to start the new semester who should take care of themselves, but also the large AI models. Some time ago, Reddit was filled with netizens complaining that Claude was getting lazy. "Its level has dropped a lot, it often pauses, and even the output becomes very short. In the first week of release, it could translate a full 4-page document at once, but now it can't even output half a page!" https:// www.reddit.com/r/ClaudeAI/comments/1by8rw8/something_just_feels_wrong_with_claude_in_the/ in a post titled "Totally disappointed with Claude", full of

At the World Robot Conference, this domestic robot carrying 'the hope of future elderly care' was surrounded At the World Robot Conference, this domestic robot carrying 'the hope of future elderly care' was surrounded Aug 22, 2024 pm 10:35 PM

At the World Robot Conference being held in Beijing, the display of humanoid robots has become the absolute focus of the scene. At the Stardust Intelligent booth, the AI ​​robot assistant S1 performed three major performances of dulcimer, martial arts, and calligraphy in one exhibition area, capable of both literary and martial arts. , attracted a large number of professional audiences and media. The elegant playing on the elastic strings allows the S1 to demonstrate fine operation and absolute control with speed, strength and precision. CCTV News conducted a special report on the imitation learning and intelligent control behind "Calligraphy". Company founder Lai Jie explained that behind the silky movements, the hardware side pursues the best force control and the most human-like body indicators (speed, load) etc.), but on the AI ​​side, the real movement data of people is collected, allowing the robot to become stronger when it encounters a strong situation and learn to evolve quickly. And agile

ACL 2024 Awards Announced: One of the Best Papers on Oracle Deciphering by HuaTech, GloVe Time Test Award ACL 2024 Awards Announced: One of the Best Papers on Oracle Deciphering by HuaTech, GloVe Time Test Award Aug 15, 2024 pm 04:37 PM

At this ACL conference, contributors have gained a lot. The six-day ACL2024 is being held in Bangkok, Thailand. ACL is the top international conference in the field of computational linguistics and natural language processing. It is organized by the International Association for Computational Linguistics and is held annually. ACL has always ranked first in academic influence in the field of NLP, and it is also a CCF-A recommended conference. This year's ACL conference is the 62nd and has received more than 400 cutting-edge works in the field of NLP. Yesterday afternoon, the conference announced the best paper and other awards. This time, there are 7 Best Paper Awards (two unpublished), 1 Best Theme Paper Award, and 35 Outstanding Paper Awards. The conference also awarded 3 Resource Paper Awards (ResourceAward) and Social Impact Award (

Hongmeng Smart Travel S9 and full-scenario new product launch conference, a number of blockbuster new products were released together Hongmeng Smart Travel S9 and full-scenario new product launch conference, a number of blockbuster new products were released together Aug 08, 2024 am 07:02 AM

This afternoon, Hongmeng Zhixing officially welcomed new brands and new cars. On August 6, Huawei held the Hongmeng Smart Xingxing S9 and Huawei full-scenario new product launch conference, bringing the panoramic smart flagship sedan Xiangjie S9, the new M7Pro and Huawei novaFlip, MatePad Pro 12.2 inches, the new MatePad Air, Huawei Bisheng With many new all-scenario smart products including the laser printer X1 series, FreeBuds6i, WATCHFIT3 and smart screen S5Pro, from smart travel, smart office to smart wear, Huawei continues to build a full-scenario smart ecosystem to bring consumers a smart experience of the Internet of Everything. Hongmeng Zhixing: In-depth empowerment to promote the upgrading of the smart car industry Huawei joins hands with Chinese automotive industry partners to provide

Li Feifei's team proposed ReKep to give robots spatial intelligence and integrate GPT-4o Li Feifei's team proposed ReKep to give robots spatial intelligence and integrate GPT-4o Sep 03, 2024 pm 05:18 PM

Deep integration of vision and robot learning. When two robot hands work together smoothly to fold clothes, pour tea, and pack shoes, coupled with the 1X humanoid robot NEO that has been making headlines recently, you may have a feeling: we seem to be entering the age of robots. In fact, these silky movements are the product of advanced robotic technology + exquisite frame design + multi-modal large models. We know that useful robots often require complex and exquisite interactions with the environment, and the environment can be represented as constraints in the spatial and temporal domains. For example, if you want a robot to pour tea, the robot first needs to grasp the handle of the teapot and keep it upright without spilling the tea, then move it smoothly until the mouth of the pot is aligned with the mouth of the cup, and then tilt the teapot at a certain angle. . this

Distributed Artificial Intelligence Conference DAI 2024 Call for Papers: Agent Day, Richard Sutton, the father of reinforcement learning, will attend! Yan Shuicheng, Sergey Levine and DeepMind scientists will give keynote speeches Distributed Artificial Intelligence Conference DAI 2024 Call for Papers: Agent Day, Richard Sutton, the father of reinforcement learning, will attend! Yan Shuicheng, Sergey Levine and DeepMind scientists will give keynote speeches Aug 22, 2024 pm 08:02 PM

Conference Introduction With the rapid development of science and technology, artificial intelligence has become an important force in promoting social progress. In this era, we are fortunate to witness and participate in the innovation and application of Distributed Artificial Intelligence (DAI). Distributed artificial intelligence is an important branch of the field of artificial intelligence, which has attracted more and more attention in recent years. Agents based on large language models (LLM) have suddenly emerged. By combining the powerful language understanding and generation capabilities of large models, they have shown great potential in natural language interaction, knowledge reasoning, task planning, etc. AIAgent is taking over the big language model and has become a hot topic in the current AI circle. Au

See all articles