Home Technology peripherals AI Ximalaya breaks through the speech overlap problem and wins first place in international conference challenge to accelerate AI innovation

Ximalaya breaks through the speech overlap problem and wins first place in international conference challenge to accelerate AI innovation

Jul 07, 2023 pm 03:42 PM

Ximalaya breaks through the speech overlapping problem and wins first place in the International Conference Challenge, accelerating AI innovation

Recently, the multi-channel multi-party conference transcription challenge (M2MeT2.0) of the 2023 international top speech conference ASRU (IEEE Automatic Speech Recognition and Understanding, Automatic Speech Recognition and Understanding) came to a successful conclusion, and the Himalayan Everest Laboratory achieved excellence Achievements and won the championship honors.

喜马拉雅在国际会议挑战赛中突破语音重叠难题斩获第一 加速AI创新

喜马拉雅在国际会议挑战赛中突破语音重叠难题斩获第一 加速AI创新

The ASRU Symposium is the flagship technical event of the IEEE Speech and Language Processing Technical Committee (SLTC), held every two years, bringing together top experts and researchers from academia and industry to discuss a wide range of speech recognition and Understand the problem. The M2MeT2.0 Challenge is a key competition of ASRU in 2023. Its goal is to solve the problem of overlapping speech transcription in offline conference rooms. As a typical "cocktail party scene" where many people talk freely, the meeting scene has always been a difficulty and focus in the field of speech recognition. It is of great significance for developing speech artificial intelligence for meeting scenes and exploring industrial-level solutions to related problems.

It is worth noting that this is not the first time that Himalaya has participated in ASRU’s M2MeT Challenge. In the first M2MeT Challenge, Ximalaya cooperated with the University of Science and Technology of China and won third place in the speaker log track, achieving a log error rate of only 4.05%. In the inaugural challenge, the evaluation uses character error rate (CER) as a metric and only audio is transcribed to text without considering speaker labels. Based on the success of the first session, the M2MeT2.0 Challenge will focus on speaker-related evaluation, promote the practicalization of multi-speaker speech recognition systems, and set up two sub-tracks, limited data and unqualified data.

In order to meet this challenge, the Himalayan Everest Laboratory started from the basic framework of speech recognition and launched technical explorations in aliasing speech detection technology and speaker log technology. Ximalaya achieved excellent first place results in both the limited data set and open data set sub-tracks of the M2MeT2.0 Challenge.

This year’s M2MeT2.0 Challenge data set contains real, multi-scenario, multi-modal large-scale data, covering a variety of conference rooms of different sizes and layouts, simulating various furniture, regular meetings with different themes, and Various indoor noises. These overlapping sounds, such as human voices, TV sounds, fan and air conditioner sounds, keyboard sounds, door opening/closing sounds, bubble sounds, etc., increase the difficulty of the game. By simultaneously using a microphone array to record distant sounds and a headset microphone to record close sounds, accurate transcription of the corresponding speaker's speech is ensured. This data set is of great academic significance for the study of multi-speaker speech recognition and speech overlap problems, and provides real and diverse data resources for finding industrial-level solutions.

All speakers in the M2MeT2.0 Challenge data set are native speakers of Chinese. Ximalaya actively participates in it through a combination of industry, academia and research, and is committed to contributing to the development of China's local speech recognition technology. In the M2MeT2.0 Challenge, Himalaya demonstrated excellent speaker and speech recognition technology (ASR) and demonstrated excellent performance. Its Everest Laboratory team used self-developed speaker recognition, speech enhancement and speech recognition modules. With optimization and experience, significant breakthroughs have been made in speech overlap and multi-speaker environments. By combining deep learning and neural network models, Himalayan Everest Laboratory is able to transcribe and accurately identify and separate the speech of multiple speakers in real time.

Ximalaya related technologies have not only been verified in the ASRU 2023 M2MeT2.0 Challenge, but have also been applied and empowered in Ximalaya AIGC content production. Currently, Ximalaya Automatic Speech Recognition (ASR) technology has been widely used in the AI ​​script function of Ximalaya App. It transcribes the voice content without scripts in the Himalaya platform and outputs the corresponding text, thereby making it easier for the audience to better understand the voice content. . At the same time, for the sound content of the original manuscript, Ximalaya's AI manuscript function uses ultra-long audio and text alignment technology to time-stamp the sound and the manuscript to achieve synchronous highlighting of sound playback and corresponding text, allowing users to It is more convenient to enjoy the content consumption experience of listening and watching at the same time.

喜马拉雅在国际会议挑战赛中突破语音重叠难题斩获第一 加速AI创新

In addition to ASR technology, Himalaya’s TTS (speech synthesis) technology is also at the forefront of the industry and has been widely used in the production of storytelling, news, novels and other content. Using the HiTTS technology framework, Shan Tianfang’s “voice” is perfectly reproduced. According to reports, Ximalaya has launched more than 100 albums synthesized by Shan Tianfang's AI synthesized sounds, and the cumulative playback volume has exceeded 100 million times.

For many years, Himalaya has been conducting in-depth research in the field of AI voice technology. Its Everest Laboratory has long been focused on research and innovation in speech synthesis, emotion analysis, speech recognition and other fields. By participating in the ASRU 2023 M2MeT2.0 Challenge and winning the championship, Himalaya further consolidated its leading position in the field of voice technology and demonstrated its excellent ability to solve complex voice scenarios.

As an online audio platform loved by users, Himalaya has always adhered to the concept of empowering culture with technology, constantly integrating technology with creators and users to improve content production efficiency and provide excellent content experience. Ximalaya will also continue to combine advanced and intelligent voice technology with sound through technological empowerment and the integration of industry, academia and research, to provide users with excellent voice technology products and services.

The above is the detailed content of Ximalaya breaks through the speech overlap problem and wins first place in international conference challenge to accelerate AI innovation. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

I Tried Vibe Coding with Cursor AI and It's Amazing! I Tried Vibe Coding with Cursor AI and It's Amazing! Mar 20, 2025 pm 03:34 PM

Vibe coding is reshaping the world of software development by letting us create applications using natural language instead of endless lines of code. Inspired by visionaries like Andrej Karpathy, this innovative approach lets dev

Top 5 GenAI Launches of February 2025: GPT-4.5, Grok-3 & More! Top 5 GenAI Launches of February 2025: GPT-4.5, Grok-3 & More! Mar 22, 2025 am 10:58 AM

February 2025 has been yet another game-changing month for generative AI, bringing us some of the most anticipated model upgrades and groundbreaking new features. From xAI’s Grok 3 and Anthropic’s Claude 3.7 Sonnet, to OpenAI’s G

How to Use YOLO v12 for Object Detection? How to Use YOLO v12 for Object Detection? Mar 22, 2025 am 11:07 AM

YOLO (You Only Look Once) has been a leading real-time object detection framework, with each iteration improving upon the previous versions. The latest version YOLO v12 introduces advancements that significantly enhance accuracy

Best AI Art Generators (Free & Paid) for Creative Projects Best AI Art Generators (Free & Paid) for Creative Projects Apr 02, 2025 pm 06:10 PM

The article reviews top AI art generators, discussing their features, suitability for creative projects, and value. It highlights Midjourney as the best value for professionals and recommends DALL-E 2 for high-quality, customizable art.

Is ChatGPT 4 O available? Is ChatGPT 4 O available? Mar 28, 2025 pm 05:29 PM

ChatGPT 4 is currently available and widely used, demonstrating significant improvements in understanding context and generating coherent responses compared to its predecessors like ChatGPT 3.5. Future developments may include more personalized interactions and real-time data processing capabilities, further enhancing its potential for various applications.

Best AI Chatbots Compared (ChatGPT, Gemini, Claude & More) Best AI Chatbots Compared (ChatGPT, Gemini, Claude & More) Apr 02, 2025 pm 06:09 PM

The article compares top AI chatbots like ChatGPT, Gemini, and Claude, focusing on their unique features, customization options, and performance in natural language processing and reliability.

How to Use Mistral OCR for Your Next RAG Model How to Use Mistral OCR for Your Next RAG Model Mar 21, 2025 am 11:11 AM

Mistral OCR: Revolutionizing Retrieval-Augmented Generation with Multimodal Document Understanding Retrieval-Augmented Generation (RAG) systems have significantly advanced AI capabilities, enabling access to vast data stores for more informed respons

Top AI Writing Assistants to Boost Your Content Creation Top AI Writing Assistants to Boost Your Content Creation Apr 02, 2025 pm 06:11 PM

The article discusses top AI writing assistants like Grammarly, Jasper, Copy.ai, Writesonic, and Rytr, focusing on their unique features for content creation. It argues that Jasper excels in SEO optimization, while AI tools help maintain tone consist

See all articles