In the future, artificial intelligence will occupy a very important position in the market, and Python language is the best programming language for studying artificial intelligence. Now, let us feel its charm!
The sample program provided by Baidu, whether C or Java version, is divided into two types: method1 and method2. The former is called implicit (the post is a json string, and the audio data is encoded into json), and the latter It is called explicit (the post is audio data). This article mainly introduces the usage examples of Baidu speech recognition API implemented in Python language. It has certain reference value. Friends in need can refer to it. I hope it can help everyone.
At the beginning, I considered that the pythonwave package dealt with "strings", and I was worried that it would be inconsistent with the C language array, so I chose the inefficient but safe method1,
That is, first base64 the audio data Encoding, plus sampling rate, number of channels and other information are gathered into a dict, and finally encoded into a json string
The result is always reported:
3300 The input parameters are incorrect
I have tried the urllib2 and pycurl packages successively, and both are in the above situation.
I had to switch to method2 and succeeded (it seems that the wave package does not store audio as a "string")
#encoding=utf-8 import wave import urllib, urllib2, pycurl import base64 import json ## get access token by api key & secret key def get_token(): apiKey = "xxxxxxxx" secretKey = "xxxxxxxxx" auth_url = "https://openapi.baidu.com/oauth/2.0/token?grant_type=client_credentials&client_id=" + apiKey + "&client_secret=" + secretKey; res = urllib2.urlopen(auth_url) json_data = res.read() return json.loads(json_data)['access_token'] def dump_res(buf): print buf ## post audio to server def use_cloud(token): fp = wave.open('vad_0.wav', 'rb') nf = fp.getnframes() f_len = nf * 2 audio_data = fp.readframes(nf) cuid = "xxxxxxxxxx" #my xiaomi phone MAC srv_url = 'http://vop.baidu.com/server_api' + '?cuid=' + cuid + '&token=' + token http_header = [ 'Content-Type: audio/pcm; rate=8000', 'Content-Length: %d' % f_len ] c = pycurl.Curl() c.setopt(pycurl.URL, str(srv_url)) #curl doesn't support unicode #c.setopt(c.RETURNTRANSFER, 1) c.setopt(c.HTTPHEADER, http_header) #must be list, not dict c.setopt(c.POST, 1) c.setopt(c.CONNECTTIMEOUT, 30) c.setopt(c.TIMEOUT, 30) c.setopt(c.WRITEFUNCTION, dump_res) c.setopt(c.POSTFIELDS, audio_data) c.setopt(c.POSTFIELDSIZE, f_len) c.perform() #pycurl.perform() has no return val if __name__ == "__main__": token = get_token() use_cloud(token)
Run results
{"corpus_no":"6150045491002357923","err_msg":"success.","err_no":0,"result":["播放小苹果,"],"sn":"243903724071431919050"}
Related recommendations:
.Net development WeChat Detailed explanation of speech recognition examples on public platforms
A brief analysis of how to use JavaScript for speech recognition
HTML5 speech recognition tag writing method with pictures_html5 tutorial skills
The above is the detailed content of Example of implementing Baidu speech recognition function in Python language. For more information, please follow other related articles on the PHP Chinese website!