Web Speech API Developer's Guide: What it is and how it works
Translator|Li Rui
Reviewer|Sun Shujuan
Web Speech API is a web technology that allows users to incorporate speech data into applications middle. It can convert speech to text and vice versa through the browser.
The Web Speech API was introduced by the W3C community in 2012. Ten years later, this API is still under development due to limited browser compatibility.
The API supports both short-term input fragments, such as a verbal command, and long-term continuous input. Extensive dictation capabilities make it ideal for integration with Applause apps, while short typing is great for language translation.
Speech recognition has had a huge impact on accessibility. Users with disabilities can use voice to browse the web more easily. Therefore, this API could be the key to making the web friendlier and more efficient.
Text-to-speech and speech-to-text functionality is handled by two interfaces: speech synthesis and speech recognition.
1. Speech recognition
In the speech recognition interface, the user speaks into the microphone, and then the speech recognition service will check what he said according to its own grammar. .
The API protects the user's privacy by first requesting permission to access the user's voice through the microphone. If the page using the API uses the HTTPS protocol, permission is only requested once. Otherwise, the API will ask in each instance.
The user's device may already include a speech recognition system, such as Siri for iOS or Android voices. When using the speech recognition interface, the default system will be used. After the speech is recognized, it is converted and returned as a text string.
In "one-shot" speech recognition, the recognition ends as soon as the user stops speaking. This is useful for short commands, such as searching the web for an application testing site or making a phone call. In "continuous" recognition, the user must manually end the recognition using the "Stop" button.
Currently, the speech recognition of the Web Speech API is only officially supported by two browsers: Chrome for Desktop and Android. Chrome needs to use the prefix interface.
However, the Web Speech API is still in the experimental stage and the specification is subject to change. You can check whether the current browser supports this API by searching for the webkitSpeechRecognition object.
2. Speech recognition attributes
Let’s learn a new function: speech recognition ().
var recognizer = new speechRecognition();
Now check the callbacks of certain events:
(1) onStart: onStart is triggered when the speech recognizer starts listening and recognizing speech. A message can be displayed to notify the user that the device is listening.
(2) onEnd: onEnd generates an event, which will be triggered every time the user ends speech recognition.
(3)onError: Whenever a speech recognition error occurs, this event will be triggered using the SpeechRecognitionError interface.
(4) onResult: This event is triggered when the speech recognition object obtains the result. It returns interim results and final results. onResult must use the SpeechRecognitionEvent interface.
The SpeechRecognitionEvent object contains the following data:
(1) results[i]: an array of speech recognition result objects, each element represents a Recognized words.
(2) resultindex: current recognition index.
(3) results[i][j]: Identify the j-th alternative word of the word; the first word that appears is the most likely word to appear.
(4) results[i].isFinal: A Boolean value showing whether the result is temporary or final.
(5) results[i][j].transcript: Text representation of the word.
(6) results[i][j].confidence: The probability that the result is correct (value range is from 0 to 1).
So, what properties should be configured on the speech recognition object? Take a look below.
(1) Continuous vs One-Shot
The user determines whether the speech recognition object is required to listen to him until it is turned off, or whether to just It is needed to recognize a short phrase. Its default setting is "false".
Assume that the technology is being used to take notes in order to integrate with the inventory tracking template. Need to be able to talk for long periods of time with enough time to pause without sending the app back to sleep. continuous can be set to true as follows:
speechRecognition.continuous = true;
(2) Language
Hope What language does the object recognize? If the browser is set to English by default, it will automatically select English. However, area codes can also be used.
Additionally, the user can be allowed to select the language from a menu:
speechRecognition.lang = document.querySelector("#select_dialect").value;
(3) Interim Results
Interim results refer to results that are not yet complete or final. By setting this property to true, you can cause the object to display temporary results as feedback to the user:
speechRecognition.interimResults = true;
(4) Start and Stop
If the speech has been If the recognition object is configured as "continuous", you need to set the onClick properties of the start and stop buttons as follows:
JavaScript
1 document.querySelector("#start").onclick = () => { 2 3 speechRecognition.start(); 4 5 }; 6 7 document.querySelector("#stop").onclick = () => { 8 9 speechRecognition.stop(); 10 11 };
这将允许用户控制使用的浏览器何时开始“监听”,何时停止。
因此,在深入了解了语音识别界面、方法和属性之后。现在探索Web Speech API的另一面。
三、语音合成
语音合成也被称为文本到语音(TTS)。语音合成是指从应用程序中获取文本,将其转换成语音,然后从设备的扬声器中播放。
可以使用语音合成做任何事情,从驾驶指南到为在线课程朗读课堂笔记,再到视觉障碍用户的屏幕阅读。
在浏览器支持方面,从Gecko42+版本开始,Web Speech API的语音合成可以在Firefox桌面和移动端使用。但是,必须首先启用权限。Firefox OS2.5+默认支持语音合成;不需要权限。Chrome和Android 33+也支持语音合成。
那么,如何让浏览器说话呢?语音合成的主要控制器界面是SpeechSynthesis,但需要一些相关的界面,例如用于输出的声音。大多数操作系统都有默认的语音合成系统。
简单地说,用户需要首先创建一个SpeechSynthesisUtterance界面的实例。其界面包含服务将读取的文本,以及语言、音量、音高和速率等信息。指定这些之后,将实例放入一个队列中,该队列告诉浏览器应该说什么以及什么时候说。
将需要说话的文本指定给其“文本”属性,如下所示:
newUtterance.text =
除非使用.lang属性另有指定,否则语言将默认为应用程序或浏览器的语言。
在网站加载后,语音更改事件可以被触发。要改变浏览器的默认语音,可以使用语音合成中的getvoices()方法。这将显示所有可用的语音。
声音的种类取决于操作系统。谷歌和MacOSx一样有自己的默认声音集。最后,用户使用Array.find()方法选择喜欢的声音。
根据需要定制SpeechSynthesisUtterance。可以启动、停止和暂停队列,或更改通话速度(“速率”)。
四、Web Speech API的优点和缺点
什么时候应该使用Web Speech API?这种技术使用起来很有趣,但仍在发展中。尽管如此,还是有很多潜在的用例。集成API可以帮助实现IT基础设施的现代化,而用户可以了解Web Speech API哪些方面已经成熟可以改进。
1.提高生产力
对着麦克风说话比打字更快捷、更有效。在当今快节奏的工作生活中,人们可能需要能够在旅途中访问网页。
它还可以很好地减少管理工作量。语音到文本技术的改进有可能显著减少数据输入任务的时间。语音到文本技术可以集成到音频视频会议中,以加快会议的记录速度。
2.可访问性
如上所述,语音到文本(STT)和文本语音(TTS)对于有残疾或支持需求的用户来说都是很好的工具。此外,由于任何原因而在写作或拼写方面有困难的用户可以通过语音识别更好地表达自己。
这样,语音识别技术就可以成为互联网上一个很好的均衡器。鼓励在办公室使用这些工具也能促进工作场所的可访问性。
3.翻译
Web Speech API可以成为一种强大的语言翻译工具,因为它同时支持语音到文本(STT)和文本语音(TTS)。目前,并不是每一种语言都可用。这是Web Speech API尚未充分发挥其潜力的一个领域。
4.离线功能
一个缺点是API必须要有互联网连接才能正常工作。此时,浏览器将输入发送到它的服务器,然后服务器返回结果。这限制了Web Speech API可以使用的环境。
5.精确度
在提高语音识别器的准确性方面已经取得了令人难以置信的进展。用户可能偶尔还会遇到一些困难,例如技术术语和其他专业词汇或者方言。然而,到2022年,语音识别软件的精确度已经达到了人类的水平。
五、结语
虽然Web Speech API还处于实验阶段,但它可以成为网站或应用程序的一个惊人的补充。从科技公司到市场营销商,所有的工作场所都可以使用这个API来提高效率。只需几行简单的JavaScript代码,就可以打开一个全新的可访问性世界。
语音识别可以使用户更容易更有效地浏览网页,人们期待看到这项技术快速成长和发展!
原文链接:https://dzone.com/articles/the-developers-guide-to-web-speech-api-what-is-it
The above is the detailed content of Web Speech API Developer's Guide: What it is and how it works. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics



How do we implement the function of generating voice subtitles on this platform? When we are making some videos, in order to have more texture, or when narrating some stories, we need to add our subtitles, so that everyone can better understand the information of some of the videos above. It also plays a role in expression, but many users are not very familiar with automatic speech recognition and subtitle generation. No matter where it is, we can easily let you make better choices in various aspects. , if you also like it, you must not miss it. We need to slowly understand some functional skills, etc., hurry up and take a look with the editor, don't miss it.

How to use WebSocket and JavaScript to implement an online speech recognition system Introduction: With the continuous development of technology, speech recognition technology has become an important part of the field of artificial intelligence. The online speech recognition system based on WebSocket and JavaScript has the characteristics of low latency, real-time and cross-platform, and has become a widely used solution. This article will introduce how to use WebSocket and JavaScript to implement an online speech recognition system.

1. Enter the control panel, find the [Speech Recognition] option, and turn it on. 2. When the speech recognition page pops up, select [Advanced Voice Options]. 3. Finally, uncheck [Run speech recognition at startup] in the User Settings column in the Voice Properties window.

Audio quality issues in voice speech recognition require specific code examples. In recent years, with the rapid development of artificial intelligence technology, voice speech recognition (Automatic Speech Recognition, referred to as ASR) has been widely used and researched. However, in practical applications, we often face audio quality problems, which directly affects the accuracy and performance of the ASR algorithm. This article will focus on audio quality issues in voice speech recognition and give specific code examples. audio quality for voice speech

Speaker variation problem in voice gender recognition requires specific code examples. With the rapid development of speech technology, voice gender recognition has become an increasingly important field. It is widely used in many application scenarios, such as telephone customer service, voice assistants, etc. However, in voice gender recognition, we often encounter a challenge, that is, speaker variability. Speaker variation refers to differences in the phonetic characteristics of the voices of different individuals. Because individual voice characteristics are affected by many factors, such as gender, age, voice, etc.

Hello everyone, I am Kite. Two years ago, the need to convert audio and video files into text content was difficult to achieve, but now it can be easily solved in just a few minutes. It is said that in order to obtain training data, some companies have fully crawled videos on short video platforms such as Douyin and Kuaishou, and then extracted the audio from the videos and converted them into text form to be used as training corpus for big data models. If you need to convert a video or audio file to text, you can try this open source solution available today. For example, you can search for the specific time points when dialogues in film and television programs appear. Without further ado, let’s get to the point. Whisper is OpenAI’s open source Whisper. Of course it is written in Python. It only requires a few simple installation packages.

With the continuous development of science and technology, speech recognition technology has also made great progress and application. Speech recognition applications are widely used in voice assistants, smart speakers, virtual reality and other fields, providing people with a more convenient and intelligent way of interaction. How to implement high-performance speech recognition applications has become a question worth exploring. In recent years, Go language, as a high-performance programming language, has attracted much attention in the development of speech recognition applications. The Go language has the characteristics of high concurrency, concise writing, and fast execution speed. It is very suitable for building high-performance

Cockpit is a web-based graphical interface for Linux servers. It is mainly intended to make managing Linux servers easier for new/expert users. In this article, we will discuss Cockpit access modes and how to switch administrative access to Cockpit from CockpitWebUI. Content Topics: Cockpit Entry Modes Finding the Current Cockpit Access Mode Enable Administrative Access for Cockpit from CockpitWebUI Disabling Administrative Access for Cockpit from CockpitWebUI Conclusion Cockpit Entry Modes The cockpit has two access modes: Restricted Access: This is the default for the cockpit access mode. In this access mode you cannot access the web user from the cockpit
