Home Technology peripherals AI The technical strength of Huoshan Voice TTS has been certified by the National Inspection and Quarantine Center, with a MOS score as high as 4.64

The technical strength of Huoshan Voice TTS has been certified by the National Inspection and Quarantine Center, with a MOS score as high as 4.64

Apr 12, 2023 am 10:40 AM
volcano engine

Recently, the Volcano Engine speech synthesis product obtained the Speech Synthesis Enhanced Inspection and Testing Certificate issued by the National Speech and Image Recognition Product Quality Inspection and Testing Center (hereinafter referred to as the "AI National Inspection Center"). The basic requirements and extended requirements for speech synthesis have reached the highest level standard of the AI ​​National Inspection Center. This evaluation is conducted from the dimensions of Mandarin Chinese, multi-dialects, multi-languages, mixed languages, multi-timbres, and personalization. The product's technical support team - the Volcano Voice Team provides a rich sound library. After evaluation, its timbre MOS The highest score can reach 4.64 points, which is an industry-leading level.

The technical strength of Huoshan Voice TTS has been certified by the National Inspection and Quarantine Center, with a MOS score as high as 4.64

As the first and only national-level voice and image product quality inspection system in the field of artificial intelligence in my country Inspection and testing agency , AI National Inspection Center has been committed to promoting the healthy development of the intelligent voice industry. Obtaining the authoritative certification from the AI ​​National Inspection Center this time also fully demonstrates that the speech synthesis technology capabilities of Huoshan Voice have reached the industry-leading level.

Feel the effect of volcano speech synthesis: https://www.php. cn/link/8e0ce414531179ae9b7f60e20351ee8b##​

More sound experience: https://www.php.cn/link/a1ada9947e0d683b4625f94c74104d73

For a long time, Huoshan Voice has been targeting Bytedance’s major business lines and Volcano Engine ToB industry and innovative scenarios provide industry-leading AI voice technology capabilities and excellent full-stack voice product solutions. At present, the team's speech recognition and speech synthesis cover multiple languages ​​and dialects, covering audio and video, audio reading, voice interaction, games, advertising and other application scenarios, and provide services for Douyin, Jianying, Feishu, Tomato Novels, Pico, etc. The core business provides leading voice capabilities.

It is understood that the Volcano Engine speech synthesis product

participating in this competition was independently developed by the Volcano Voice team and uses the industry's leading generative neural network technology , mainly composed of It consists of three major modules: front-end text analysis, acoustic model, and vocoder. The specific introduction is as follows:

    Front-end text analysis: mainly responsible for intelligibility, such as text regularization (such as Convert numbers into year readings, number readings, etc.), character pronunciation conversion (such as Chinese phonetic notation, especially to solve the problem of multi-phonetic characters), as well as word segmentation and rhyme prediction, etc.
  • Currently, the Huoshan Voice team relies on multi-task models and neural network regularization to support 12 mainstream minor languages ​​at the same time, with remarkable results.
  • Acoustic model: Mainly responsible for modeling from linguistic features to acoustic features. Data shows that the back-end accuracy rate of Huoshan Voice TTS can reach 99.90%. At the same time, the model can also support refined control of multiple emotions and styles, transfer of styles between different timbres, and achieve multilingual synthesis effects using only training data in a single language.
  • Vocoder module: Mainly responsible for modeling acoustic features to audio signals. Now the Volcano Voice team has self-developed a vocoder based on adversarial neural network modeling, with an accuracy rate of up to 99.95%. Relying on lightweight model design and engineering optimization, the real-time rate in the cloud can reach more than a hundred times.
  • The Volcano Engine speech synthesis product has a real and natural sound, vivid interpretation, and diverse styles. At the same time, it restores the rhythm of real people in a fine-grained manner and realizes various side effects such as laughter. The language phenomenon brings an immersive listening experience to people. This is the case with the supernatural dialogue speech synthesis technology recently released by the
  • Volcano Voice Team. Compared with traditional TTS, it can perfectly reproduce details such as modal particles, inhalation sounds, pauses during hesitation, and pronunciation prolongation, and only requires a conventional sound library. 1/4 data.
In addition, the

"tone reproduction technology" that was popular on the Internet was also developed by the Volcano Voice team. Different from the high threshold requirements of traditional speech synthesis technology for data, the Volcano voice timbre reproduction technology requires only 0.3% of the data amount of traditional methods. Ordinary people can work in a relatively quiet environment. Recording in an open environment for more than 2 minutes can meet the standards of timbre space modeling and generate an AI model of exclusive timbre, which is convenient and efficient.

Currently, Huoshan Voice will bring its speech technology capabilities that have been polished for many years to the market and open them to external companies through the Volcano engine. It has covered many application scenarios such as automobiles, finance, audio reading, video dubbing, etc., and has assisted Ruhe Many leading companies in the industry, such as Volkswagen Automotive and Zhuishu Artifact, have realized the application and expansion of AI voice capabilities. In the future, Huoshan Voice will continue to explore the efficient combination of cutting-edge technology and business scenarios, and continue to inject innovative energy into user experience and business growth. , to achieve greater value.

The above is the detailed content of The technical strength of Huoshan Voice TTS has been certified by the National Inspection and Quarantine Center, with a MOS score as high as 4.64. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Bytedance Beanbao large model released, Volcano Engine full-stack AI service helps enterprises intelligently transform Bytedance Beanbao large model released, Volcano Engine full-stack AI service helps enterprises intelligently transform Jun 05, 2024 pm 07:59 PM

Tan Dai, President of Volcano Engine, said that companies that want to implement large models well face three key challenges: model effectiveness, inference costs, and implementation difficulty: they must have good basic large models as support to solve complex problems, and they must also have low-cost inference. Services allow large models to be widely used, and more tools, platforms and applications are needed to help companies implement scenarios. ——Tan Dai, President of Huoshan Engine 01. The large bean bag model makes its debut and is heavily used. Polishing the model effect is the most critical challenge for the implementation of AI. Tan Dai pointed out that only through extensive use can a good model be polished. Currently, the Doubao model processes 120 billion tokens of text and generates 30 million images every day. In order to help enterprises implement large-scale model scenarios, the beanbao large-scale model independently developed by ByteDance will be launched through the volcano

The marketing effect has been greatly improved, this is how AIGC video creation should be used The marketing effect has been greatly improved, this is how AIGC video creation should be used Jun 25, 2024 am 12:01 AM

After more than a year of development, AIGC has gradually moved from text dialogue and picture generation to video generation. Looking back four months ago, the birth of Sora caused a reshuffle in the video generation track and vigorously promoted the scope and depth of AIGC's application in the field of video creation. In an era when everyone is talking about large models, on the one hand we are surprised by the visual shock brought by video generation, on the other hand we are faced with the difficulty of implementation. It is true that large models are still in a running-in period from technology research and development to application practice, and they still need to be tuned based on actual business scenarios, but the distance between ideal and reality is gradually being narrowed. Marketing, as an important implementation scenario for artificial intelligence technology, has become a direction that many companies and practitioners want to make breakthroughs. Once you master the appropriate methods, the creative process of marketing videos will be

The technical strength of Huoshan Voice TTS has been certified by the National Inspection and Quarantine Center, with a MOS score as high as 4.64 The technical strength of Huoshan Voice TTS has been certified by the National Inspection and Quarantine Center, with a MOS score as high as 4.64 Apr 12, 2023 am 10:40 AM

Recently, the Volcano Engine speech synthesis product has obtained the speech synthesis enhanced inspection and testing certificate issued by the National Speech and Image Recognition Product Quality Inspection and Testing Center (hereinafter referred to as the "AI National Inspection Center"). It has met the basic requirements and extended requirements of speech synthesis. The highest level standard of AI National Inspection Center. This evaluation is conducted from the dimensions of Mandarin Chinese, multi-dialects, multi-languages, mixed languages, multi-timbrals, and personalization. The product’s technical support team, the Volcano Voice Team, provides a rich sound library. After evaluation, its timbre MOS score is the highest. It reached 4.64 points, which is at the leading level in the industry. As the first and only national quality inspection and testing agency for voice and image products in the field of artificial intelligence in my country’s quality inspection system, the AI ​​National Inspection Center has been committed to promoting intelligent

Focusing on personalized experience, retaining users depends entirely on AIGC? Focusing on personalized experience, retaining users depends entirely on AIGC? Jul 15, 2024 pm 06:48 PM

1. Before purchasing a product, consumers will search and browse product reviews on social media. Therefore, it is becoming increasingly important for companies to market their products on social platforms. The purpose of marketing is to: Promote the sale of products Establish a brand image Improve brand awareness Attract and retain customers Ultimately improve the profitability of the company The large model has excellent understanding and generation capabilities and can provide users with personalized information by browsing and analyzing user data content recommendations. In the fourth issue of "AIGC Experience School", two guests will discuss in depth the role of AIGC technology in improving "marketing conversion rate". Live broadcast time: July 10, 19:00-19:45 Live broadcast topic: Retaining users, how does AIGC improve conversion rate through personalization? The fourth episode of the program invited two important

An in-depth exploration of the implementation of unsupervised pre-training technology and 'algorithm optimization + engineering innovation' of Huoshan Voice An in-depth exploration of the implementation of unsupervised pre-training technology and 'algorithm optimization + engineering innovation' of Huoshan Voice Apr 08, 2023 pm 12:44 PM

For a long time, Volcano Engine has provided intelligent video subtitle solutions based on speech recognition technology for popular video platforms. To put it simply, it is a function that uses AI technology to automatically convert the voices and lyrics in the video into text to assist in video creation. However, with the rapid growth of platform users and the requirement for richer and more diverse language types, the traditionally used supervised learning technology has increasingly reached its bottleneck, which has put the team in real trouble. As we all know, traditional supervised learning will rely heavily on manually annotated supervised data, especially in the continuous optimization of large languages ​​​​and the cold start of small languages. Taking major languages ​​​​such as Chinese, Mandarin and English as an example, although the video platform provides sufficient voice data for business scenarios, after the supervised data reaches a certain scale, it will continue to

All Douyin is speaking native dialects, two key technologies help you 'understand” local dialects All Douyin is speaking native dialects, two key technologies help you 'understand” local dialects Oct 12, 2023 pm 08:13 PM

During the National Day, Douyin’s “A word of dialect proves that you are from your hometown” campaign attracted enthusiastic participation from netizens from all over the country. The topic topped the Douyin challenge list, with more than 50 million views. This “Local Dialect Awards” quickly became popular on the Internet, which is inseparable from the contribution of Douyin’s newly launched local dialect automatic translation function. When the creators recorded short videos in their native dialect, they used the "automatic subtitles" function and selected "convert to Mandarin subtitles", so that the dialect speech in the video can be automatically recognized and the dialect content can be converted into Mandarin subtitles. This allows netizens from other regions to easily understand various "encrypted Mandarin" languages. Netizens from Fujian personally tested it and said that even the southern Fujian region with "different pronunciation" is a region in Fujian Province, China.

The 'Health + AI' ecological innovation competition jointly organized by Volcano Engine and Yili ended successfully The 'Health + AI' ecological innovation competition jointly organized by Volcano Engine and Yili ended successfully Jan 13, 2024 am 11:57 AM

Health + AI =? Brain health nutrition solutions for middle-aged and elderly people, digital intelligent nutrition and health services, AIGC big health community solutions... With the unfolding of the "Health + AI" ecological innovation competition, each of them contains technological energy and empowers the health industry. Innovative solutions are about to come out, and the answer to "health + AI =?" is slowly emerging. On December 26, the "Health + AI" ecological innovation competition jointly sponsored by Yili Group and Volcano Engine came to a successful conclusion. Six winning companies, including Shanghai Bosten Network Technology Co., Ltd. and Zhongke Suzhou Intelligent Computing Technology Research Institute, stood out. In the competition that lasted for more than a month, Yili joined hands with outstanding scientific and technological enterprises to explore the deep integration of AI technology and the health industry, continuously raising expectations for the competition. "Health + AI" Ecological Innovation Competition

Barrier-free travel is safer! ByteDance's research results won the CVPR2022 AVA competition championship Barrier-free travel is safer! ByteDance's research results won the CVPR2022 AVA competition championship Apr 08, 2023 pm 11:01 PM

Recently, the results of various CVPR2022 competitions have been announced. ByteDance's intelligent creation AI platform "Byte-IC-AutoML" team won the Accessibility Vision and Autonomy Challenge (hereinafter referred to as AVA) based on synthetic data, relying on its self-developed The Parallel Pre-trained Transformers (PPT) framework stood out as the winner of the only track in the competition. Paper address: https://arxiv.org/abs/2206.10845 This AVA competition is sponsored by Boston University (Bos

See all articles