Running Kokoro- ONNX TTS Model in the Browser-JS Tutorial-php.cn

Running Kokoro- ONNX TTS Model in the Browser

Linda Hamilton

Release： 2025-01-17 14:31:10

Original

926 people have browsed it

Running Kokoro- ONNX TTS Model in the Browser

Advances in artificial intelligence and machine learning have significantly expanded the boundaries of what is possible within the browser. Running text-to-speech (TTS) models directly in the browser opens new opportunities for privacy, speed, and convenience. In this blog post, we will explore how to run the Kokoro-82M ONNX TTS model in a browser using a JavaScript implementation. If you’re curious, you can test it out in my demo: Kitt AI Text-to-Speech .

Why run TTS models in the browser?

Traditionally, TTS models are executed on a server and require an internet connection to send input and receive synthesized speech. However, with the enhancements to WebGPU and ONNX.js, you can now run advanced models like Kokoro-82M ONNX directly in the browser. This brings many advantages:

Privacy: Your text data never leaves your device.
Low Latency: Eliminate server communication delays.
Offline Access: Works even without an active internet connection.

Kokoro-82M ONNX Overview

The Kokoro-82M ONNX model is a lightweight yet effective TTS model optimized for on-device inference. It provides high-quality speech synthesis while maintaining a small footprint, making it suitable for browser environments.

Project settings

Prerequisites

To run Kokoro-82M ONNX in your browser you need:

Modern browsers with WebGPU/WebGL support.
The ONNX.js library for running ONNX models in JavaScript.
Kokoro.js script, which simplifies the loading and processing of Kokoro-82M models.

Installation

You can set up your project by including the necessary dependencies in package.json:

<code>{
  "dependencies": {
    "@huggingface/transformers": "^3.3.1"
  }
}</code>

Copy after login

Next, make sure you have the Kokoro.js script, which is available from this repository.

Model loading

To load and use the Kokoro-82M ONNX model in your browser, follow these steps:

<code class="language-javascript">this.model_instance = StyleTextToSpeech2Model.from_pretrained(
    this.modelId,
    {
        device: "wasm",
        progress_callback,
    }
);
this.tokenizer = AutoTokenizer.from_pretrained(this.modelId, {
   progress_callback,
});</code>

Copy after login

Run inference

After loading the model and processing the text, you can run inference to generate speech:

<code class="language-javascript">const language = speakerId.at(0); // "a" 或 "b"
const phonemes = await phonemize(text, language);
const { input_ids } = await tokenizer(phonemes, { truncation: true });
const num_tokens = Math.max(
   input_ids.dims.at(-1) - 2, // 无填充；
   0
);
const offset = num_tokens * STYLE_DIM;
const data = await getVoiceData(speakerId as keyof typeof VOICES);
const voiceData = data.slice(offset, offset + STYLE_DIM);
const inputs = {
   input_ids,
   style: new Tensor("float32", voiceData, [1, STYLE_DIM]),
   speed: new Tensor("float32", [speed], [1]),
};

const { waveform } = await model(inputs);
const audio = new RawAudio(waveform.data, SAMPLE_RATE).toBlob();</code>

Copy after login

Demo

You can see this in my live demo: Kitt AI Text to Speech. This demo showcases real-time text-to-speech synthesis powered by Kokoro-82M ONNX.

Conclusion

Running TTS models like the Kokoro-82M ONNX in the browser represents a leap forward for privacy-preserving and low-latency applications. With just a few lines of JavaScript code and the power of ONNX.js, you can create high-quality, responsive TTS applications that delight your users. Whether you're building accessibility tools, voice assistants, or interactive applications, in-browser TTS could be a game-changer.

Try the Kitt AI text-to-speech demo now and see for yourself!

References

Hugging Face Transformers.js Documentation
ModNet Model
WebGPU API
React Documentation
Reference Code

The above is the detailed content of Running Kokoro- ONNX TTS Model in the Browser. For more information, please follow other related articles on the PHP Chinese website!