Table of Contents
The newly proposed method
Results
The value of supporting a thousand words with a single model
Home Technology peripherals AI Meta uses the Bible to train a super multi-language model: recognize 1107 languages ​​and identify 4017 languages

Meta uses the Bible to train a super multi-language model: recognize 1107 languages ​​and identify 4017 languages

May 29, 2023 pm 03:45 PM
technology Model

There is a story about the Tower of Babel in the Bible. It is said that human beings united to plan to build a high tower, hoping to lead to heaven, but God disrupted human language and the plan failed. Today, AI technology is expected to tear down the barriers between human languages ​​and help mankind create a civilized Tower of Babel.

Recently, a study by Meta has taken an important step towards this aspect. They call the newly proposed method Massively Multilingual Speech (MMS), which is based on "The Bible" was used as part of the training data and the following results were obtained:

  • #Using wave2vec 2.0 to train on 1107 languages, a multi-language speech recognition with 1 billion parameters was obtained Compared with OpenAI's Whisper model, the error rate of the model is reduced by more than 50%.
  • A single audio synthesis model supports text-to-speech (TTS) for these 1107 languages.
  • Developed a language recognition classifier capable of identifying 4017 languages.

How does Meta solve the problem of data scarcity in many rare languages? The method they used is interesting, using religious corpora, because corpora like the Bible have the most "aligned" speech data. Although this dataset is skewed toward religious content and features mostly male voices, the paper shows that the model performs well in other domains as well when using female voices. This is the emergent behavior of the base model, and it's truly amazing. What’s even more amazing is that Meta has released all newly developed models (speech recognition, TTS and language recognition) for free!

  • Model download: https://github.com/facebookresearch/fairseq/tree/main/examples/mms
  • Paper address: https://research.facebook.com/publications/scaling-speech-technology-to-1000-languages/

The newly proposed method

In order to create a speech model that can recognize thousands of words, the first challenge is to collect audio data in various languages, because the largest speech data set currently available is only Up to 100 languages. To overcome this problem, Meta researchers used religious texts, such as the Bible, which have been translated into many different languages, and those translations have been extensively studied. These translations have audio recordings of people reading them in different languages, and these audios are also publicly available. Using these audios, the researchers created a dataset containing audio of people reading the New Testament in 1,100 languages, with an average audio length of 32 hours per language.

They then included unannotated recordings of many other Christian readings, increasing the number of available languages ​​to more than 4,000. Although the field of this data set is single and mostly consists of male voices, the analysis results show that Meta’s newly developed model performs equally well on female voices, and the model is not particularly biased towards producing more religious language. The researchers stated in the blog that this is mainly due to the Connectionist Temporal Classification method they used, which is far superior to large language models (LLM) or sequence-to-sequence speech recognition models. More restricted.

Meta uses the Bible to train a super multi-language model: recognize 1107 languages ​​and identify 4017 languages

# Analysis of potential gender bias situations. On the FLEURS benchmark, this automatic speech recognition model trained on the Multilingual Speech (MMS) dataset has similar error rates for male and female voices.

In order to improve the quality of data so that it can be used by machine learning algorithms, they also adopted some preprocessing methods. First, they trained an alignment model on existing data from more than 100 languages, and then paired it with an efficient forced alignment algorithm that can handle very long recordings of more than 20 minutes. Afterwards, after multiple rounds of alignment processes, a final step of cross-validation filtering is performed to remove potentially misaligned data based on model accuracy. In order to facilitate other researchers to create new speech data sets, Meta added the alignment algorithm to PyTorch and released the alignment model.

To train a generally usable supervised speech recognition model, just 32 hours of data per language is not enough. Therefore, their model is developed based on wav2vec 2.0, which is their previous research on self-supervised speech representation learning, which can greatly reduce the amount of labeled data required for training. Specifically, the researchers trained a self-supervised model using approximately 500,000 hours of speech data in more than 1,400 languages—more than five times more languages ​​than any previous study. Then, based on specific speech tasks (such as multilingual speech recognition or language recognition), the researchers fine-tune the resulting model.

Results

The researchers evaluated the newly developed model on some existing benchmarks.

The training of its multi-language speech recognition model uses the wav2vec 2.0 model with 1 billion parameters, and the training data set contains more than 1,100 languages. Model performance does decrease as the number of languages ​​increases, but the decrease is very small: when the number of languages ​​increases from 61 to 1107, the character error rate increases by only 0.4%, but the language coverage increases by more than 18 times.

Meta uses the Bible to train a super multi-language model: recognize 1107 languages ​​and identify 4017 languages

On the benchmark test of 61 FLEURS languages, the character error rate changes as the number of languages ​​increases, error rate The higher it is, the worse the model is.

By comparing OpenAI's Whisper model, the researchers found that their model's word error rate was only half that of Whisper, while the new model supported 11 times more languages. This result demonstrates the superior capabilities of the new method.

Meta uses the Bible to train a super multi-language model: recognize 1107 languages ​​and identify 4017 languages

Comparison of word error rates between OpenAI Whisper and MMS on benchmarks of 54 directly comparable FLEURS languages .

Next, using previously existing data sets (such as FLEURS and CommonVoice) and new data sets, Meta researchers also trained a language identification (LID) model and used The FLEURS LID task was evaluated. The results show that not only does the new model perform great, but it also supports 40 times more languages.

Previous research also only supported more than 100 languages ​​on the VoxLingua-107 benchmark, while MMS supports more than 4000 languages.

In addition, Meta has built a text-to-speech system that supports 1,100 languages. The training data for current text-to-speech models is usually speech corpus from a single speaker. One limitation of the MMS data is that many languages ​​have only a small number of speakers, often even a single speaker. However, this became an advantage when building a text-to-speech system, so Meta built a TTS system that supports more than 1,100 languages. Researchers say the quality of speech generated by these systems is actually quite good, and several examples are given below.

Demo of MMS text-to-speech model for Yoruba, Iroko and Maithili languages.

Despite this, researchers say that AI technology is still not perfect, and the same is true for MMS. For example, MMS may mistranscribe selected words or phrases during speech-to-text. This may result in offensive and/or inaccurate language in the output. The researchers emphasized the importance of working with the AI ​​community to develop responsibly.

The value of supporting a thousand words with a single model

Many languages ​​around the world are endangered, and the limitations of current speech recognition and speech generation technology will only further accelerate this trend. The researcher imagined in the blog: Maybe technology can encourage people to retain their own language, because with good technology, they can use their favorite language to obtain information and use technology.

They believe the MMS project is an important step in this direction. They also said that the project will continue to be developed and will support more languages ​​in the future, and will even solve the problems of dialects and accents.

The above is the detailed content of Meta uses the Bible to train a super multi-language model: recognize 1107 languages ​​and identify 4017 languages. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

The world's most powerful open source MoE model is here, with Chinese capabilities comparable to GPT-4, and the price is only nearly one percent of GPT-4-Turbo The world's most powerful open source MoE model is here, with Chinese capabilities comparable to GPT-4, and the price is only nearly one percent of GPT-4-Turbo May 07, 2024 pm 04:13 PM

Imagine an artificial intelligence model that not only has the ability to surpass traditional computing, but also achieves more efficient performance at a lower cost. This is not science fiction, DeepSeek-V2[1], the world’s most powerful open source MoE model is here. DeepSeek-V2 is a powerful mixture of experts (MoE) language model with the characteristics of economical training and efficient inference. It consists of 236B parameters, 21B of which are used to activate each marker. Compared with DeepSeek67B, DeepSeek-V2 has stronger performance, while saving 42.5% of training costs, reducing KV cache by 93.3%, and increasing the maximum generation throughput to 5.76 times. DeepSeek is a company exploring general artificial intelligence

AI subverts mathematical research! Fields Medal winner and Chinese-American mathematician led 11 top-ranked papers | Liked by Terence Tao AI subverts mathematical research! Fields Medal winner and Chinese-American mathematician led 11 top-ranked papers | Liked by Terence Tao Apr 09, 2024 am 11:52 AM

AI is indeed changing mathematics. Recently, Tao Zhexuan, who has been paying close attention to this issue, forwarded the latest issue of "Bulletin of the American Mathematical Society" (Bulletin of the American Mathematical Society). Focusing on the topic "Will machines change mathematics?", many mathematicians expressed their opinions. The whole process was full of sparks, hardcore and exciting. The author has a strong lineup, including Fields Medal winner Akshay Venkatesh, Chinese mathematician Zheng Lejun, NYU computer scientist Ernest Davis and many other well-known scholars in the industry. The world of AI has changed dramatically. You know, many of these articles were submitted a year ago.

Google is ecstatic: JAX performance surpasses Pytorch and TensorFlow! It may become the fastest choice for GPU inference training Google is ecstatic: JAX performance surpasses Pytorch and TensorFlow! It may become the fastest choice for GPU inference training Apr 01, 2024 pm 07:46 PM

The performance of JAX, promoted by Google, has surpassed that of Pytorch and TensorFlow in recent benchmark tests, ranking first in 7 indicators. And the test was not done on the TPU with the best JAX performance. Although among developers, Pytorch is still more popular than Tensorflow. But in the future, perhaps more large models will be trained and run based on the JAX platform. Models Recently, the Keras team benchmarked three backends (TensorFlow, JAX, PyTorch) with the native PyTorch implementation and Keras2 with TensorFlow. First, they select a set of mainstream

Hello, electric Atlas! Boston Dynamics robot comes back to life, 180-degree weird moves scare Musk Hello, electric Atlas! Boston Dynamics robot comes back to life, 180-degree weird moves scare Musk Apr 18, 2024 pm 07:58 PM

Boston Dynamics Atlas officially enters the era of electric robots! Yesterday, the hydraulic Atlas just "tearfully" withdrew from the stage of history. Today, Boston Dynamics announced that the electric Atlas is on the job. It seems that in the field of commercial humanoid robots, Boston Dynamics is determined to compete with Tesla. After the new video was released, it had already been viewed by more than one million people in just ten hours. The old people leave and new roles appear. This is a historical necessity. There is no doubt that this year is the explosive year of humanoid robots. Netizens commented: The advancement of robots has made this year's opening ceremony look like a human, and the degree of freedom is far greater than that of humans. But is this really not a horror movie? At the beginning of the video, Atlas is lying calmly on the ground, seemingly on his back. What follows is jaw-dropping

KAN, which replaces MLP, has been extended to convolution by open source projects KAN, which replaces MLP, has been extended to convolution by open source projects Jun 01, 2024 pm 10:03 PM

Earlier this month, researchers from MIT and other institutions proposed a very promising alternative to MLP - KAN. KAN outperforms MLP in terms of accuracy and interpretability. And it can outperform MLP running with a larger number of parameters with a very small number of parameters. For example, the authors stated that they used KAN to reproduce DeepMind's results with a smaller network and a higher degree of automation. Specifically, DeepMind's MLP has about 300,000 parameters, while KAN only has about 200 parameters. KAN has a strong mathematical foundation like MLP. MLP is based on the universal approximation theorem, while KAN is based on the Kolmogorov-Arnold representation theorem. As shown in the figure below, KAN has

FisheyeDetNet: the first target detection algorithm based on fisheye camera FisheyeDetNet: the first target detection algorithm based on fisheye camera Apr 26, 2024 am 11:37 AM

Target detection is a relatively mature problem in autonomous driving systems, among which pedestrian detection is one of the earliest algorithms to be deployed. Very comprehensive research has been carried out in most papers. However, distance perception using fisheye cameras for surround view is relatively less studied. Due to large radial distortion, standard bounding box representation is difficult to implement in fisheye cameras. To alleviate the above description, we explore extended bounding box, ellipse, and general polygon designs into polar/angular representations and define an instance segmentation mIOU metric to analyze these representations. The proposed model fisheyeDetNet with polygonal shape outperforms other models and simultaneously achieves 49.5% mAP on the Valeo fisheye camera dataset for autonomous driving

Tesla robots work in factories, Musk: The degree of freedom of hands will reach 22 this year! Tesla robots work in factories, Musk: The degree of freedom of hands will reach 22 this year! May 06, 2024 pm 04:13 PM

The latest video of Tesla's robot Optimus is released, and it can already work in the factory. At normal speed, it sorts batteries (Tesla's 4680 batteries) like this: The official also released what it looks like at 20x speed - on a small "workstation", picking and picking and picking: This time it is released One of the highlights of the video is that Optimus completes this work in the factory, completely autonomously, without human intervention throughout the process. And from the perspective of Optimus, it can also pick up and place the crooked battery, focusing on automatic error correction: Regarding Optimus's hand, NVIDIA scientist Jim Fan gave a high evaluation: Optimus's hand is the world's five-fingered robot. One of the most dexterous. Its hands are not only tactile

The latest from Oxford University! Mickey: 2D image matching in 3D SOTA! (CVPR\'24) The latest from Oxford University! Mickey: 2D image matching in 3D SOTA! (CVPR\'24) Apr 23, 2024 pm 01:20 PM

Project link written in front: https://nianticlabs.github.io/mickey/ Given two pictures, the camera pose between them can be estimated by establishing the correspondence between the pictures. Typically, these correspondences are 2D to 2D, and our estimated poses are scale-indeterminate. Some applications, such as instant augmented reality anytime, anywhere, require pose estimation of scale metrics, so they rely on external depth estimators to recover scale. This paper proposes MicKey, a keypoint matching process capable of predicting metric correspondences in 3D camera space. By learning 3D coordinate matching across images, we are able to infer metric relative

See all articles