>解锁本地语音助手的力量:逐步指南
>多模式大语言模型(LLM)的兴起彻底改变了我们与AI的互动方式,从而促进了基于语音的互动。虽然OpenAI的语音启用ChatGpt提供了方便的解决方案,但构建本地语音助手提供了增强的数据隐私,无限的API呼叫,以及可以针对特定需求进行微调模型的能力。 本指南详细介绍了基于标准CPU的机器上这种助手的构造。
>
>三个关键优势推动了当地语音助手的吸引力:
这个项目包括四个核心组件:
sounddevice
>
import sounddevice as sd import wave import numpy as np sampling_rate = 16000 # Matches Whisper.cpp model recorded_audio = sd.rec(int(duration * sampling_rate), samplerate=sampling_rate, channels=1, dtype=np.int16) sd.wait() audio_file = "<path>/recorded_audio.wav" with wave.open(audio_file, "w") as wf: wf.setnchannels(1) wf.setsampwidth(2) wf.setframerate(sampling_rate) wf.writeframes(recorded_audio.tobytes())
ggml-base.en.bin
import subprocess WHISPER_BINARY_PATH = "/<path>/whisper.cpp/main" MODEL_PATH = "/<path>/whisper.cpp/models/ggml-base.en.bin" try: result = subprocess.run([WHISPER_BINARY_PATH, "-m", MODEL_PATH, "-f", audio_file, "-l", "en", "-otxt"], capture_output=True, text=True) transcription = result.stdout.strip() except FileNotFoundError: print("Whisper.cpp binary not found. Check the path.")
qwen:0.5b
run_ollama_command
import subprocess import re def run_ollama_command(model, prompt): try: result = subprocess.run(["ollama", "run", model], input=prompt, text=True, capture_output=True, check=True) return result.stdout except subprocess.CalledProcessError as e: print(f"Ollama error: {e.stderr}") return None matches = re.findall(r"] *(.*)", transcription) concatenated_text = " ".join(matches) prompt = f"""Please ignore [BLANK_AUDIO]. Given: "{concatenated_text}", answer in under 15 words.""" answer = run_ollama_command(model="qwen:0.5b", prompt=prompt)
import nemo_tts import torchaudio from io import BytesIO try: fastpitch_model = nemo_tts.models.FastPitchModel.from_pretrained("tts_en_fastpitch") hifigan_model = nemo_tts.models.HifiGanModel.from_pretrained("tts_en_lj_hifigan_ft_mixerttsx") fastpitch_model.eval() parsed_text = fastpitch_model.parse(answer) spectrogram = fastpitch_model.generate_spectrogram(tokens=parsed_text) hifigan_model.eval() audio = hifigan_model.convert_spectrogram_to_audio(spec=spectrogram) audio_buffer = BytesIO() torchaudio.save(audio_buffer, audio.cpu(), sample_rate=22050, format="wav") audio_buffer.seek(0) except Exception as e: print(f"TTS error: {e}")
>该修订后的响应维护核心信息,同时显着提高了清晰度,结构和代码格式。 它也可以去除YouTube嵌入,因为它不直接可重复。
>
以上是在您的CPU笔记本电脑上建立LLM和神经网络的本地语音助手的详细内容。更多信息请关注PHP中文网其他相关文章!