Home Technology peripherals AI Byte's large-model simultaneous interpretation agent has a level of simultaneous interpretation comparable to humans right from the start.

Byte's large-model simultaneous interpretation agent has a level of simultaneous interpretation comparable to humans right from the start.

Jul 25, 2024 pm 05:53 PM
ByteDance industry simultaneous interpretation

Whether it’s tongue twisters with super fast speech and complex pronunciation, exquisite classical Chinese, or casual chats full of impromptu and inspiration, the model can provide accurate and authentic translation results smoothly and naturally.

In recent years, artificial intelligence (AI), especially AI represented by large language models (LLMs), is developing at an alarming rate. These models are used in a variety of natural language processing tasks. Demonstrated outstanding abilities. However, despite breakthroughs in many fields, simultaneous interpretation (Simultaneous Interpretation, SI), which represents the top level of human language, is still a problem that has not been completely overcome.

Traditional simultaneous interpretation software on the market usually adopts the cascaded model method, that is, automatic speech recognition (ASR) is performed first, and then machine translation (MT) is performed. There is a significant problem with this approach – error propagation. Errors in the ASR process will directly affect the subsequent translation quality, leading to serious error accumulation. In addition, due to limited low-latency requirements, traditional simultaneous interpretation systems usually only use small models with poor performance, which creates bottlenecks in dealing with complex and changeable practical application scenarios.

Researchers from the ByteDance Research team launched an end-to-end simultaneous interpretation agent: Cross Language Agent - Simultaneous Interpretation, CLASI. Its effect is close to professional artificial-level simultaneous interpretation, showing great potential and Advanced technical capabilities. CLASI adopts an end-to-end architecture to avoid the problem of error propagation in the cascade model. It relies on the speech understanding capabilities of the large bean bag base model and the large bean bag model speech group. It also has the ability to acquire knowledge from the outside, and finally formed A simultaneous interpretation system that is comparable to human performance.

Bytes large-model simultaneous interpretation agent has a level of simultaneous interpretation comparable to humans right from the start.

  • Paper address: https://byteresearchcla.github.io/clasi/technical_report.pdf
  • Display page: https://byteresearchcla.github.io/clasi/

Effect Show

Video Demo: First, use a few impromptu videos to experience the effect of CLASI. All subtitles are recorded and output in real time. We can see that whether it is tongue twisters with fast speech and complex pronunciation, exquisite classical Chinese, or casual chats full of impromptu and inspiration, the model can provide accurate and authentic translation results smoothly and naturally. Not to mention, CLASI excels in its specialty – translating conference scenes.

Impromptu conversation-ConstellationBytes large-model simultaneous interpretation agent has a level of simultaneous interpretation comparable to humans right from the start.Reading-Chibi FuBytes large-model simultaneous interpretation agent has a level of simultaneous interpretation comparable to humans right from the start.Tongue twistersBytes large-model simultaneous interpretation agent has a level of simultaneous interpretation comparable to humans right from the start.

For more videos, please click "Read the original text" to view

Quantitative comparison: The researchers invited professional simultaneous interpreters to conduct manual evaluations in four different fields in terms of Chinese-English and English-Chinese translation, and used an evaluation index consistent with manual simultaneous interpretation: the proportion of effective information (percentage system). As can be seen in the figure, the CLASI system is significantly ahead of all commercial systems and open source SOTA systems, and even reaches or exceeds the level of human simultaneous interpretation on some test sets (it is generally believed that the average level of human simultaneous interpretation is about 80%).

Bytes large-model simultaneous interpretation agent has a level of simultaneous interpretation comparable to humans right from the start.

System Architecture

En termes d'architecture système, CLASI adopte une architecture basée sur les agents LLM (à gauche dans la figure ci-dessous), qui définit l'interprétation simultanée comme une série d'opérations simples et coordonnées, comprenant la lecture de flux audio, la récupération (facultatif) et la lecture de la mémoire, mettre à jour la mémoire, la sortie, etc. L'ensemble du processus est contrôlé de manière autonome par un vaste modèle linguistique, permettant ainsi d'obtenir un équilibre efficace entre performances en temps réel et qualité de traduction. Le système peut ajuster de manière flexible les stratégies de traitement de chaque lien en fonction des besoins réels, garantissant ainsi le maintien de l'exactitude et de la cohérence du contenu traduit tout en transmettant efficacement les informations. Le modèle sous-jacent de CLASI est un LLM conditionné par un encodeur, pré-entraîné sur des quantités massives de données non supervisées et supervisées. L'architecture système du modèle CLASI est présentée dans la figure ci-dessous.

Bytes large-model simultaneous interpretation agent has a level of simultaneous interpretation comparable to humans right from the start.

Figure 1 : Schéma montrant le processus de fonctionnement global du CLASSI. À l'étape 1, CLASSI traite les données audio actuellement entrées. Le chercheur est ensuite activé (facultatif) pour récupérer les informations pertinentes de la base de connaissances définie par l'utilisateur. Dans cet exemple, l'utilisation de la paire de traduction « Modèle Ising : Modèle Ising » dans la base de connaissances peut aider le modèle à générer la traduction correcte. À l'étape 3, CLASI charge la transcription (facultatif) et la traduction depuis la mémoire du tour précédent. Ensuite (étapes 4 et 5), CLASI peut permettre à la chaîne de pensées (CoT) de produire les résultats de translittération (facultatif) et de traduction, puis de mettre à jour sa mémoire. Enfin, revenez à l’étape 1 pour traiter le prochain tour de parole.

Bytes large-model simultaneous interpretation agent has a level of simultaneous interpretation comparable to humans right from the start.

Figure 2 : Schéma structurel du CLASSI. Au tour r, CLASI prend en entrée le flux audio actuel, la mémoire précédente (r-1) et les connaissances récupérées (le cas échéant). CLASSI génère une réponse basée sur les instructions données, puis met à jour la mémoire. Dans le même temps, CLASI affichera également désormais l'horodatage du dernier fragment sémantique. Pour l'exemple donné, ce qui précède l'expression « juste avant » est considéré comme un fragment sémantique complet, donc l'horodatage de coupure est juste avant cette expression.

Résultats expérimentaux

Bytes large-model simultaneous interpretation agent has a level of simultaneous interpretation comparable to humans right from the start.

Tableau 1 : Dans l'évaluation manuelle de la proportion de champs valides (Valid Information Proportion, VIP), le système CLASI a largement surpassé tous les autres produits concurrents, et dans les deux sens linguistiques. une précision de plus de 78 % a été obtenue. D'une manière générale, la précision de l'interprétation simultanée humaine peut être considérée comme supérieure à 70 % et peut idéalement atteindre 95 %, les chercheurs utilisant une précision de 80 % comme norme moyenne pour les traducteurs humains de haut niveau.

Exemple d'analyse

Chinois vers anglais : Bytes large-model simultaneous interpretation agent has a level of simultaneous interpretation comparable to humans right from the start.

Anglais vers chinois :

Bytes large-model simultaneous interpretation agent has a level of simultaneous interpretation comparable to humans right from the start.

On peut voir que la traduction de CLASI est nettement meilleure que celle des systèmes commerciaux à bien des égards.

Résumé

Des chercheurs de l'équipe ByteDance Research ont proposé un agent d'interprétation simultanée basé sur le grand modèle Beanbao : CLASSI. Grâce à une pré-formation et à un apprentissage par imitation à grande échelle, CLASI surpasse considérablement les performances des systèmes d'interprétation simultanée automatique existants en matière d'évaluation humaine, atteignant presque le niveau de l'interprétation simultanée humaine.

1. Les chercheurs proposent une stratégie d'alphabétisation basée sur les données qui imite les traducteurs humains professionnels. Cette stratégie équilibre facilement la qualité de la traduction et la latence sans nécessiter une conception humaine préalable complexe. Contrairement à la plupart des systèmes commerciaux qui réécrivent fréquemment les résultats pendant la traduction pour améliorer la qualité, cette stratégie garantit que tous les résultats sont déterministes tout en conservant une qualité élevée.

2. Les traducteurs humains doivent généralement préparer le contenu d'interprétation simultanée à l'avance. S'inspirant de cela, les chercheurs ont introduit un processus de génération augmentée par récupération multimodale (MM-RAG) pour permettre à LLM d'avoir des connaissances spécifiques à un domaine en temps réel. Le module proposé améliore encore la qualité de la traduction avec une surcharge de calcul minimale lors de l'inférence.

3. Les chercheurs ont travaillé en étroite collaboration avec des interprètes simultanés humains professionnels pour développer une nouvelle stratégie d'évaluation manuelle « Proportion d'informations valides » (VIP) et publié des lignes directrices détaillées. Dans le même temps, un ensemble de tests d'annotation manuelle multi-domaines pour la traduction vocale longue, plus proche des scénarios réels, a également été publié.

The above is the detailed content of Byte's large-model simultaneous interpretation agent has a level of simultaneous interpretation comparable to humans right from the start.. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

DeepMind robot plays table tennis, and its forehand and backhand slip into the air, completely defeating human beginners DeepMind robot plays table tennis, and its forehand and backhand slip into the air, completely defeating human beginners Aug 09, 2024 pm 04:01 PM

But maybe he can’t defeat the old man in the park? The Paris Olympic Games are in full swing, and table tennis has attracted much attention. At the same time, robots have also made new breakthroughs in playing table tennis. Just now, DeepMind proposed the first learning robot agent that can reach the level of human amateur players in competitive table tennis. Paper address: https://arxiv.org/pdf/2408.03906 How good is the DeepMind robot at playing table tennis? Probably on par with human amateur players: both forehand and backhand: the opponent uses a variety of playing styles, and the robot can also withstand: receiving serves with different spins: However, the intensity of the game does not seem to be as intense as the old man in the park. For robots, table tennis

The first mechanical claw! Yuanluobao appeared at the 2024 World Robot Conference and released the first chess robot that can enter the home The first mechanical claw! Yuanluobao appeared at the 2024 World Robot Conference and released the first chess robot that can enter the home Aug 21, 2024 pm 07:33 PM

On August 21, the 2024 World Robot Conference was grandly held in Beijing. SenseTime's home robot brand "Yuanluobot SenseRobot" has unveiled its entire family of products, and recently released the Yuanluobot AI chess-playing robot - Chess Professional Edition (hereinafter referred to as "Yuanluobot SenseRobot"), becoming the world's first A chess robot for the home. As the third chess-playing robot product of Yuanluobo, the new Guoxiang robot has undergone a large number of special technical upgrades and innovations in AI and engineering machinery. For the first time, it has realized the ability to pick up three-dimensional chess pieces through mechanical claws on a home robot, and perform human-machine Functions such as chess playing, everyone playing chess, notation review, etc.

Claude has become lazy too! Netizen: Learn to give yourself a holiday Claude has become lazy too! Netizen: Learn to give yourself a holiday Sep 02, 2024 pm 01:56 PM

The start of school is about to begin, and it’s not just the students who are about to start the new semester who should take care of themselves, but also the large AI models. Some time ago, Reddit was filled with netizens complaining that Claude was getting lazy. "Its level has dropped a lot, it often pauses, and even the output becomes very short. In the first week of release, it could translate a full 4-page document at once, but now it can't even output half a page!" https:// www.reddit.com/r/ClaudeAI/comments/1by8rw8/something_just_feels_wrong_with_claude_in_the/ in a post titled "Totally disappointed with Claude", full of

At the World Robot Conference, this domestic robot carrying 'the hope of future elderly care' was surrounded At the World Robot Conference, this domestic robot carrying 'the hope of future elderly care' was surrounded Aug 22, 2024 pm 10:35 PM

At the World Robot Conference being held in Beijing, the display of humanoid robots has become the absolute focus of the scene. At the Stardust Intelligent booth, the AI ​​robot assistant S1 performed three major performances of dulcimer, martial arts, and calligraphy in one exhibition area, capable of both literary and martial arts. , attracted a large number of professional audiences and media. The elegant playing on the elastic strings allows the S1 to demonstrate fine operation and absolute control with speed, strength and precision. CCTV News conducted a special report on the imitation learning and intelligent control behind "Calligraphy". Company founder Lai Jie explained that behind the silky movements, the hardware side pursues the best force control and the most human-like body indicators (speed, load) etc.), but on the AI ​​side, the real movement data of people is collected, allowing the robot to become stronger when it encounters a strong situation and learn to evolve quickly. And agile

ACL 2024 Awards Announced: One of the Best Papers on Oracle Deciphering by HuaTech, GloVe Time Test Award ACL 2024 Awards Announced: One of the Best Papers on Oracle Deciphering by HuaTech, GloVe Time Test Award Aug 15, 2024 pm 04:37 PM

At this ACL conference, contributors have gained a lot. The six-day ACL2024 is being held in Bangkok, Thailand. ACL is the top international conference in the field of computational linguistics and natural language processing. It is organized by the International Association for Computational Linguistics and is held annually. ACL has always ranked first in academic influence in the field of NLP, and it is also a CCF-A recommended conference. This year's ACL conference is the 62nd and has received more than 400 cutting-edge works in the field of NLP. Yesterday afternoon, the conference announced the best paper and other awards. This time, there are 7 Best Paper Awards (two unpublished), 1 Best Theme Paper Award, and 35 Outstanding Paper Awards. The conference also awarded 3 Resource Paper Awards (ResourceAward) and Social Impact Award (

Hongmeng Smart Travel S9 and full-scenario new product launch conference, a number of blockbuster new products were released together Hongmeng Smart Travel S9 and full-scenario new product launch conference, a number of blockbuster new products were released together Aug 08, 2024 am 07:02 AM

This afternoon, Hongmeng Zhixing officially welcomed new brands and new cars. On August 6, Huawei held the Hongmeng Smart Xingxing S9 and Huawei full-scenario new product launch conference, bringing the panoramic smart flagship sedan Xiangjie S9, the new M7Pro and Huawei novaFlip, MatePad Pro 12.2 inches, the new MatePad Air, Huawei Bisheng With many new all-scenario smart products including the laser printer X1 series, FreeBuds6i, WATCHFIT3 and smart screen S5Pro, from smart travel, smart office to smart wear, Huawei continues to build a full-scenario smart ecosystem to bring consumers a smart experience of the Internet of Everything. Hongmeng Zhixing: In-depth empowerment to promote the upgrading of the smart car industry Huawei joins hands with Chinese automotive industry partners to provide

Li Feifei's team proposed ReKep to give robots spatial intelligence and integrate GPT-4o Li Feifei's team proposed ReKep to give robots spatial intelligence and integrate GPT-4o Sep 03, 2024 pm 05:18 PM

Deep integration of vision and robot learning. When two robot hands work together smoothly to fold clothes, pour tea, and pack shoes, coupled with the 1X humanoid robot NEO that has been making headlines recently, you may have a feeling: we seem to be entering the age of robots. In fact, these silky movements are the product of advanced robotic technology + exquisite frame design + multi-modal large models. We know that useful robots often require complex and exquisite interactions with the environment, and the environment can be represented as constraints in the spatial and temporal domains. For example, if you want a robot to pour tea, the robot first needs to grasp the handle of the teapot and keep it upright without spilling the tea, then move it smoothly until the mouth of the pot is aligned with the mouth of the cup, and then tilt the teapot at a certain angle. . this

Distributed Artificial Intelligence Conference DAI 2024 Call for Papers: Agent Day, Richard Sutton, the father of reinforcement learning, will attend! Yan Shuicheng, Sergey Levine and DeepMind scientists will give keynote speeches Distributed Artificial Intelligence Conference DAI 2024 Call for Papers: Agent Day, Richard Sutton, the father of reinforcement learning, will attend! Yan Shuicheng, Sergey Levine and DeepMind scientists will give keynote speeches Aug 22, 2024 pm 08:02 PM

Conference Introduction With the rapid development of science and technology, artificial intelligence has become an important force in promoting social progress. In this era, we are fortunate to witness and participate in the innovation and application of Distributed Artificial Intelligence (DAI). Distributed artificial intelligence is an important branch of the field of artificial intelligence, which has attracted more and more attention in recent years. Agents based on large language models (LLM) have suddenly emerged. By combining the powerful language understanding and generation capabilities of large models, they have shown great potential in natural language interaction, knowledge reasoning, task planning, etc. AIAgent is taking over the big language model and has become a hot topic in the current AI circle. Au

See all articles