Using Raspberry Pi to implement a conversational robot
Recently, I used Raspberry Pi to implement a robot that can talk to people. Let’s give a brief introduction.
Raspberry Pi is the world's most popular microcomputer motherboard and a leader in open source hardware. It is designed for student computer programming education, is only the size of a credit card, and is affordable. Supports operating systems such as linux (debian). The most important thing is that the information is complete and the community is active.
I am using Raspberry Pi B version. The basic configuration is Broadcom BCM2836 processor, 4-core 900M clock speed, and 1G RAM.
My goal is to make a robot that can talk to people, which requires the robot to have input devices and output devices. The input device is a microphone, and the output can be HDMI, headphones or speakers. I used speakers here. Below is a photo of my Raspberry Pi. The 4 USB interfaces are respectively connected to wireless network cards, wireless keyboards, microphones, and audio power supplies.
We can divide the robot’s conversation into three parts: listening, thinking, and speaking.
"Listening" means recording what people say and converting it into words.
"Thinking" means giving different outputs based on different inputs. For example, if the other party says "it's time now", you can reply "it's xx o'clock xx minutes Beijing time".
"Speak" means converting text into speech and playing it back.
These three parts involve a lot of speech recognition, speech synthesis, artificial intelligence and other technologies, which require a lot of time and effort to research. Fortunately, some companies have opened interfaces for customers to use. Here, I chose Baidu’s API. The implementation of these three parts is explained below.
"Listen"
The first thing is to record what people say. I used the arecord tool. The command is as follows:
- arecord -D "plughw:1" -f S16_LE -r 16000 test.wav
Among them, the -D parameter is followed by the recording device and the microphone is connected. Finally, there are 2 devices on the Raspberry Pi: internal device and external USB device. plughw:1 represents using the external device. -f indicates the recording format, and -r indicates the sound sampling frequency. Since the Baidu speech recognition mentioned later has requirements for the audio file format, we need to record it into a format that meets the requirements. In addition, I did not specify the recording time here, it will continue recording until the user presses ctrl-c. The recorded audio file is saved as test.wav.
Next, we need to convert the audio into text, that is, speech recognition (asr). Baidu's voice open platform provides free services and supports REST API
For documentation, see: http://yuyin.baidu. com/docs/asr/57
The process is basically to obtain the token, send the voice information, voice data, token, etc. that need to be recognized to Baidu's speech recognition server, and then the corresponding text can be obtained. Because the server supports REST API, we can use any language to implement the client code. Here I use python
<ol style="margin:0 1px 0 0px;padding-left:40px;" start="1" class="dp-css"><li># coding: utf-8<br /> </li><li><br /></li><li>import urllib.request<br /></li><li>import json<br /></li><li>import base64<br /></li><li>import sys<br /></li><li><br /></li><li>def get_access_token():<br /></li><li>url = "https://openapi.baidu.com/oauth/2.0/token"<br /></li><li>grant_type = "client_credentials"<br /></li><li>client_id = "xxxxxxxxxxxxxxxxxx"<br /></li><li>client_secret = "xxxxxxxxxxxxxxxxxxxxxx"<br /></li><li><br /></li><li>url = url + "?" + "grant_type=" + grant_type + "&" + "client_id=" + client_id + "&" + "client_secret=" + client_secret<br /></li><li><br /></li><li>resp = urllib.request.urlopen(url).read()<br /></li><li>data = json.loads(resp.decode("utf-8"))<br /></li><li>return data["access_token"]<br /></li><li><br /></li><li><br /></li><li>def baidu_asr(data, id, token):<br /></li><li>speech_data = base64.b64encode(data).decode("utf-8")<br /></li><li>speech_length = len(data)<br /></li><li><br /></li><li>post_data = {<br /></li><li>"format" : "wav",<br /></li><li>"rate" : 16000,<br /></li><li>"channel" : 1,<br /></li><li>"cuid" : id,<br /></li><li>"token" : token,<br /></li><li>"speech" : speech_data,<br /></li><li>"len" : speech_length<br /></li><li>}<br /></li><li><br /></li><li>url = "http://vop.baidu.com/server_api"<br /></li><li>json_data = json.dumps(post_data).encode("utf-8")<br /></li><li>json_length = len(json_data)<br /></li><li>#print(json_data)<br /></li><li><br /></li><li>req = urllib.request.Request(url, data = json_data)<br /></li><li>req.add_header("Content-Type", "application/json")<br /></li><li>req.add_header("Content-Length", json_length)<br /></li><li><br /></li><li>print("asr start request\n")<br /></li><li>resp = urllib.request.urlopen(req)<br /></li><li>print("asr finish request\n")<br /></li><li>resp = resp.read()<br /></li><li>resp_data = json.loads(resp.decode("utf-8"))<br /></li><li>if resp_data["err_no"] == 0:<br /></li><li>return resp_data["result"]<br /></li><li>else:<br /></li><li>print(resp_data)<br /></li><li>return None<br /></li><li><br /></li><li>def asr_main(filename):<br /></li><li>f = open(filename, "rb")<br /></li><li>audio_data = f.read()<br /></li><li>f.close()<br /></li><li><br /></li><li>#token = get_access_token()<br /></li><li>token = "xxxxxxxxxxxxxxxxxx"<br /></li><li>uuid = "xxxx"<br /></li><li>resp = baidu_asr(audio_data, uuid, token)<br /></li><li>print(resp[0])<br /></li><li>return resp[0] </li></ol>
Copy after login
"Thinking"
Here I use Turing from Baidu api store robot. Its documentation can be found at: http://apistore.baidu.com/apiworks/servicedetail/736.html
Its use is very simple and will not be described in detail here. The code is as follows:
<ol style="margin:0 1px 0 0px;padding-left:40px;" start="1" class="dp-css"><li>import urllib.request<br /> </li><li>import sys<br /></li><li>import json<br /></li><li><br /></li><li>def robot_main(words):<br /></li><li>url = "http://apis.baidu.com/turing/turing/turing?"<br /></li><li><br /></li><li>key = "879a6cb3afb84dbf4fc84a1df2ab7319"<br /></li><li>userid = "1000"<br /></li><li><br /></li><li>words = urllib.parse.quote(words)<br /></li><li>url = url + "key=" + key + "&info=" + words + "&userid=" + userid<br /></li><li><br /></li><li>req = urllib.request.Request(url)<br /></li><li>req.add_header("apikey", "xxxxxxxxxxxxxxxxxxxxxxxxxx")<br /></li><li><br /></li><li>print("robot start request")<br /></li><li>resp = urllib.request.urlopen(req)<br /></li><li>print("robot stop request")<br /></li><li>content = resp.read()<br /></li><li>if content:<br /></li><li>data = json.loads(content.decode("utf-8"))<br /></li><li>print(data["text"])<br /></li><li>return data["text"]<br /></li><li>else:<br /></li><li>return None</li></ol>
Copy after login
"Speaking"
first needs to convert text into speech, that is, speech synthesis (tts). Then play the sound.
Baidu's voice open platform provides a tts interface, and can configure male and female voices, intonation, speaking speed, and volume. The server returns audio data in mp3 format. We write the data to the file in binary format.
For details, see http://yuyin.baidu.com/docs/tts/136
The code is as follows:
<ol style="margin:0 1px 0 0px;padding-left:40px;" start="1" class="dp-css"><li># coding: utf-8<br /> </li><li><br /></li><li>import urllib.request<br /></li><li>import json<br /></li><li>import sys<br /></li><li><br /></li><li>def baidu_tts_by_post(data, id, token):<br /></li><li>post_data = {<br /></li><li>"tex" : data,<br /></li><li>"lan" : "zh",<br /></li><li>"ctp" : 1,<br /></li><li>"cuid" : id,<br /></li><li>"tok" : token,<br /></li><li>}<br /></li><li><br /></li><li>url = "http://tsn.baidu.com/text2audio"<br /></li><li>post_data = urllib.parse.urlencode(post_data).encode('utf-8')<br /></li><li>#print(post_data)<br /></li><li>req = urllib.request.Request(url, data = post_data)<br /></li><li><br /></li><li>print("tts start request")<br /></li><li>resp = urllib.request.urlopen(req)<br /></li><li>print("tts finish request")<br /></li><li>resp = resp.read()<br /></li><li>return resp<br /></li><li><br /></li><li>def tts_main(filename, words):<br /></li><li>token = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"<br /></li><li>text = urllib.parse.quote(words)<br /></li><li>uuid = "xxxx"<br /></li><li>resp = baidu_tts_by_post(text, uuid, token)<br /></li><li><br /></li><li>f = open("test.mp3", "wb")<br /></li><li>f.write(resp)<br /></li><li>f.close() </li></ol>
Copy after login
After getting the audio file, you can use the mpg123 player to play it.
- mpg123 test.mp3
Integration
Finally, combine these three parts.
You can first integrate the python-related code into main.py, as follows:
<ol style="margin:0 1px 0 0px;padding-left:40px;" start="1" class="dp-css"><li>import asr<br /> </li><li>import tts<br /></li><li>import robot<br /></li><li><br /></li><li>words = asr.asr_main("test.wav")<br /></li><li>new_words = robot.robot_main(words)<br /></li><li>tts.tts_main("test.mp3", new_words) </li></ol>
Copy after login
Then use the script to call related tools:
- #! /bin/bash
- arecord -D "plughw:1" -f S16_LE -r 16000 test.wav
- python3 main.py
- mpg123 test .mp3
Okay, now you can talk to the robot. Run the script, say something into the microphone, then press ctrl-c, and the robot will reply to you.
http://www.bkjia.com/PHPjc/1108027.htmlwww.bkjia.comtruehttp: //www.bkjia.com/PHPjc/1108027.htmlTechArticleUsing Raspberry Pi to implement a conversational robot. Recently, I used Raspberry Pi to implement a robot that can talk to people. Brief introduction one time. Raspberry Pi is the world's most popular microcomputer...