How to convert audio content to text format in Python-Python Tutorial-php.cn

Home

Backend Development

Python Tutorial

How to convert audio content to text format in Python

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

May 10, 2023 am 11:07 AM

python

Build a development environment

Go to the directory where the Python virtual environment is saved. I keep my directory in a venvs subdirectory under the user's home directory. Use the following command to create a new virtualenv for this project.

python3 -m venv ~/venvs/pytranscribe

Copy after login

Activate virtualenv with shell command:

source ~/venvs/pytranscribe/bin/activate

Copy after login

After executing the above command, the command prompt will change, so the name of the virtualenv will start with the original command prompt format, if your prompt Just $, then it will look like this:

(pytranscribe) $

Copy after login

Remember that you must activate your virtualenv in a new terminal window using dependencies in each virtualenv.

Now we can install the request package into the activated but empty virtualenv.

pip install requests==2.24.0

Copy after login

Look for output similar to the following to confirm that the corresponding package was installed correctly from PyPI.

(pytranscribe) $ pip install requests==2.24.0  Collecting requests==2.24.0    Using cached https://files.pythonhosted.org/packages/45/1e/0c169c6a5381e241ba7404532c16a21d86ab872c9bed8bdcd4c423954103/requests-2.24.0-py2.py3-none-any.whl  Collecting certifi>=2017.4.17 (from requests==2.24.0)    Using cached https://files.pythonhosted.org/packages/5e/c4/6c4fe722df5343c33226f0b4e0bb042e4dc13483228b4718baf286f86d87/certifi-2020.6.20-py2.py3-none-any.whl  Collecting urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 (from requests==2.24.0)    Using cached https://files.pythonhosted.org/packages/9f/f0/a391d1463ebb1b233795cabfc0ef38d3db4442339de68f847026199e69d7/urllib3-1.25.10-py2.py3-none-any.whl  Collecting chardet<4,>=3.0.2 (from requests==2.24.0)    Using cached https://files.pythonhosted.org/packages/bc/a9/01ffebfb562e4274b6487b4bb1ddec7ca55ec7510b22e4c51f14098443b8/chardet-3.0.4-py2.py3-none-any.whl  Collecting idna<3,>=2.5 (from requests==2.24.0)    Using cached https://files.pythonhosted.org/packages/a2/38/928ddce2273eaa564f6f50de919327bf3a00f091b5baba8dfa9460f3a8a8/idna-2.10-py2.py3-none-any.whl  Installing collected packages: certifi, urllib3, chardet, idna, requests  Successfully installed certifi-2020.6.20 chardet-3.0.4 idna-2.10 requests-2.24.0 urllib3-1.25.10

Copy after login

We have installed all the required dependencies so we can start coding the application.

Upload, launch and transcribe audio

We have everything we need to start building an app that converts audio to text. We will build this application in three files:

1. upload_audio_file.py: Upload your audio file to a safe location on the AssemblyAI service so it can be processed. If your audio file is already accessible via a public URL, you don't need to perform this step, just follow this quickstart

2. initial_transcription.py: Tell the API which file you want to transcribe and start it immediately

3. get_transcription.py: Displays the transcription status if the transcription is still being processed, or displays the transcription results after processing is complete

Create a new directory called pytranscribe to store the files as we write them . Then go to the new project directory.

mkdir pytranscibe  cd pytranscribe

Copy after login

We also need to export the AssemblyAI API key as an environment variable. Sign up for an AssemblyAI account and log into the AssemblyAI dashboard, then copy "your API token"

export ASSEMBLYAI_KEY=your-api-key-here

Copy after login

Please note that the export command must be used in every command line window to ensure that this key is accessible. If you don't export the tag as ASSEMBLYAI_KEY in the environment you're running the script in, the script we're writing won't be able to access the API.

Now that we have created the project directory and set the API key as an environment variable, let's move on to writing the code for the first file that will upload the audio file to the AssemblyAI service.

Upload audio files and transcribe

Create a new file called upload_audio_file.py and put the following code into it:

import argparse  import os  import requests  API_URL = "https://api.assemblyai.com/v2/"  def upload_file_to_api(filename):      """Checks for a valid file and then uploads it to AssemblyAI      so it can be saved to a secure URL that only that service can access.      When the upload is complete we can then initiate the transcription      API call.      Returns the API JSON if successful, or None if file does not exist.      """      if not os.path.exists(filename):          return None      def read_file(filename, chunk_size=5242880):          with open(filename, 'rb') as _file:              while True:                  data = _file.read(chunk_size)                  if not data:                      break                  yield data      headers = {'authorization': os.getenv("ASSEMBLYAI_KEY")}      response = requests.post("".join([API_URL, "upload"]), headersheaders=headers,                               data=read_file(filename))      return response.json()

Copy after login

The above code imports the argparse, os and request packages so that we can use them in this script. API_URL is a constant that has the base URL of the AssemblyAI service. We define the upload_file_to_api function with a single parameter, filename should be a string containing the absolute path to the file and its filename.

In the function, we check if the file exists and then use Request’s chunked transfer encoding to stream the large file to the AssemblyAI API.

The getenv function of the os module reads the API set on the command line using the export command with getenv. Make sure to use the export command in the terminal running this script, otherwise the ASSEMBLYAI_KEY value will be blank. If in doubt, use echo $ASSEMBLY_AI to see if the value matches your API key.

To use the upload_file_to_api function, add the following lines of code to the upload_audio_file.py file so that we can correctly execute this code as a script called using the python command:

if __name__ == "__main__":      parser = argparse.ArgumentParser()      parser.add_argument("filename")      args = parser.parse_args()      upload_filename = args.filename      response_json = upload_file_to_api(upload_filename)      if not response_json:          print("file does not exist")      else:          print("File uploaded to URL: {}".format(response_json['upload_url']))

Copy after login

above The code creates an ArgumentParser object, which allows the application to take a single argument from the command line to specify the object we want to access, read and upload the file to the AssmeblyAI service.

If the file does not exist, the script will display a message stating that the file cannot be found. In the path we did find the correct file and then uploaded the file using the code in the upload_file_to_api function.

Execute the script by running the complete upload_audio_file.py script on the command line using the python command. Replace FULL_PATH_TO_FILE with the absolute path to the file you want to upload, for example /Users/matt/devel/audio.mp3.

python upload_audio_file.py FULL_PATH_TO_FILE

Copy after login

Assuming the file is found in the location you specified, when the script finishes uploading the file, it will print a message with a unique URL:

File uploaded to URL: https://cdn.assemblyai.com/upload/463ce27f-0922-4ea9-9ce4-3353d84b5638

Copy after login

This URL is not public and can only Used by the AssemblyAI service so your file and its contents are not accessible to anyone but you and its transcribed API.

The important part is the last part of the URL, which in this example is 463ce27f-0922-4ea9-9ce4-3353d84b5638. Save this unique identifier as we need to pass it to the next script that starts the transcription service.

Start Transcription

Next, we will write some code to start transcription. Create a new file called initial_transcription.py. Add the following code to the new file.

import argparse  import os  import requests  API_URL = "https://api.assemblyai.com/v2/"  CDN_URL = "https://cdn.assemblyai.com/"  def initiate_transcription(file_id):      """Sends a request to the API to transcribe a specific      file that was previously uploaded to the API. This will      not immediately return the transcription because it takes      a moment for the service to analyze and perform the      transcription, so there is a different function to retrieve      the results.      """      endpoint = "".join([API_URL, "transcript"])      json = {"audio_url": "".join([CDN_URL, "upload/{}".format(file_id)])}      headers = {          "authorization": os.getenv("ASSEMBLYAI_KEY"),          "content-type": "application/json"      }      response = requests.post(endpoint, jsonjson=json, headersheaders=headers)      return response.json()

Copy after login

We have the same imports as the previous script and add a new constant CDN_URL that matches the separate URL where AssemblyAI stores uploaded audio files.

initiate_transcription函数本质上只是向AssemblyAI API设置了一个HTTP请求，以传入的特定URL对音频文件启动转录过程。这就是为什么file_id传递很重要的原因：完成音频文件的URL 我们告诉AssemblyAI进行检索。

通过附加此代码来完成文件，以便可以从命令行轻松地使用参数调用它。

if __name__ == "__main__":      parser = argparse.ArgumentParser()      parser.add_argument("file_id")      args = parser.parse_args()      file_id = args.file_id      response_json = initiate_transcription(file_id)      print(response_json)

Copy after login

通过在initiate_transcription文件上运行python命令来启动脚本，并传入您在上一步中保存的唯一文件标识符。

# the FILE_IDENTIFIER is returned in the previous step and will  # look something like this: 463ce27f-0922-4ea9-9ce4-3353d84b5638  python initiate_transcription.py FILE_IDENTIFIER

Copy after login

API将发回该脚本打印到命令行的JSON响应。

{'audio_end_at': None, 'acoustic_model': 'assemblyai_default', 'text': None,    'audio_url': 'https://cdn.assemblyai.com/upload/463ce27f-0922-4ea9-9ce4-3353d84b5638',    'speed_boost': False, 'language_model': 'assemblyai_default', 'redact_pii': False,    'confidence': None, 'webhook_status_code': None,    'id': 'gkuu2krb1-8c7f-4fe3-bb69-6b14a2cac067', 'status': 'queued', 'boost_param': None,    'words': None, 'format_text': True, 'webhook_url': None, 'punctuate': True,   'utterances': None, 'audio_duration': None, 'auto_highlights': False,    'word_boost': [], 'dual_channel': None, 'audio_start_from': None}

Copy after login

记下JSON响应中id键的值。这是我们需要用来检索转录结果的转录标识符。在此示例中，它是gkuu2krb1-8c7f-4fe3-bb69-6b14a2cac067。复制转录标识符到您自己的响应中，因为在下一步中我们将需要它来检查转录过程何时完成。

检索转录结果

我们已经上传并开始了转录过程，因此，准备就绪后，我们将尽快获得结果。

返回结果所需的时间取决于文件的大小，因此下一个脚本将向HTTP发送一个HTTP请求，并报告转录状态，或者在完成后打印输出。

创建一个名为 get_transcription.py 的第三个Python文件，并将以下代码放入其中。

import argparse  import os  import requests  API_URL = "https://api.assemblyai.com/v2/"  def get_transcription(transcription_id):      """Requests the transcription from the API and returns the JSON      response."""      endpoint = "".join([API_URL, "transcript/{}".format(transcription_id)])      headers = {"authorization": os.getenv('ASSEMBLYAI_KEY')}      response = requests.get(endpoint, headersheaders=headers)     return response.json() if __name__ == "__main__":      parser = argparse.ArgumentParser()      parser.add_argument("transcription_id")      args = parser.parse_args()      transcription_id = args.transcription_id      response_json = get_transcription(transcription_id)      if response_json['status'] == "completed":          for word in response_json['words']:              print(word['text'], end=" ")      else:          print("current status of transcription request: {}".format(                response_json['status']))

Copy after login

上面的代码与其他脚本具有相同的 imports 对象。在这个新的get_transcription函数中，我们只需使用我们的API密钥和上一步中的转录标识符（而不是文件标识符）调用AssemblyAI API。我们检索JSON响应并将其返回。

在main函数中，我们处理作为命令行参数传入的转录标识符，并将其传递给get_transcription函数。如果来自get_transcription函数的响应JSON包含completed状态，则我们将打印转录结果。否则，请在completed之前打印当前状态如queued或processing。

使用命令行和上一节中的转录标识符调用脚本：

python get_transcription.py TRANSCRIPTION_ID

Copy after login

如果该服务尚未开始处理脚本，则它将返回queued，如下所示：

current status of transcription request: queued

Copy after login

当服务当前正在处理音频文件时，它将返回processing：

current status of transcription request: processing

Copy after login

该过程完成后，我们的脚本将返回转录文本，如您在此处看到的那样：

An object relational mapper is a code library that automates the transfer of   data stored in relational, databases into objects that are more commonly used  in application code or EMS are useful because they provide a high level   ...(output abbreviated)

Copy after login

The above is the detailed content of How to convert audio content to text format in Python. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Assassin's Creed Shadows: Seashell Riddle Solution

3 weeks ago By DDD

What's New in Windows 11 KB5054979 & How to Fix Update Issues

2 weeks ago By DDD

Where to find the Crane Control Keycard in Atomfall

3 weeks ago By DDD

Assassin's Creed Shadows - How To Find The Blacksmith And Unlock Weapon And Armour Customisation

1 months ago By DDD

Roblox: Dead Rails - How To Complete Every Challenge

3 weeks ago By DDD

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Where is the login entrance for gmail email?

7612

CakePHP Tutorial

1387

What is the format of the account name of steam

win11 activation key permanent

nyt connections hints and answers

136

Related knowledge

Is the vscode extension malicious? Apr 15, 2025 pm 07:57 PM

VS Code extensions pose malicious risks, such as hiding malicious code, exploiting vulnerabilities, and masturbating as legitimate extensions. Methods to identify malicious extensions include: checking publishers, reading comments, checking code, and installing with caution. Security measures also include: security awareness, good habits, regular updates and antivirus software.

How to run programs in terminal vscode Apr 15, 2025 pm 06:42 PM

In VS Code, you can run the program in the terminal through the following steps: Prepare the code and open the integrated terminal to ensure that the code directory is consistent with the terminal working directory. Select the run command according to the programming language (such as Python's python your_file_name.py) to check whether it runs successfully and resolve errors. Use the debugger to improve debugging efficiency.

Can vs code run in Windows 8 Apr 15, 2025 pm 07:24 PM

VS Code can run on Windows 8, but the experience may not be great. First make sure the system has been updated to the latest patch, then download the VS Code installation package that matches the system architecture and install it as prompted. After installation, be aware that some extensions may be incompatible with Windows 8 and need to look for alternative extensions or use newer Windows systems in a virtual machine. Install the necessary extensions to check whether they work properly. Although VS Code is feasible on Windows 8, it is recommended to upgrade to a newer Windows system for a better development experience and security.

Can visual studio code be used in python Apr 15, 2025 pm 08:18 PM

VS Code can be used to write Python and provides many features that make it an ideal tool for developing Python applications. It allows users to: install Python extensions to get functions such as code completion, syntax highlighting, and debugging. Use the debugger to track code step by step, find and fix errors. Integrate Git for version control. Use code formatting tools to maintain code consistency. Use the Linting tool to spot potential problems ahead of time.

Choosing Between PHP and Python: A Guide Apr 18, 2025 am 12:24 AM

PHP is suitable for web development and rapid prototyping, and Python is suitable for data science and machine learning. 1.PHP is used for dynamic web development, with simple syntax and suitable for rapid development. 2. Python has concise syntax, is suitable for multiple fields, and has a strong library ecosystem.

Can vscode be used for mac Apr 15, 2025 pm 07:36 PM

VS Code is available on Mac. It has powerful extensions, Git integration, terminal and debugger, and also offers a wealth of setup options. However, for particularly large projects or highly professional development, VS Code may have performance or functional limitations.

Can vscode run ipynb Apr 15, 2025 pm 07:30 PM

The key to running Jupyter Notebook in VS Code is to ensure that the Python environment is properly configured, understand that the code execution order is consistent with the cell order, and be aware of large files or external libraries that may affect performance. The code completion and debugging functions provided by VS Code can greatly improve coding efficiency and reduce errors.

Golang vs. Python: Concurrency and Multithreading Apr 17, 2025 am 12:20 AM

Golang is more suitable for high concurrency tasks, while Python has more advantages in flexibility. 1.Golang efficiently handles concurrency through goroutine and channel. 2. Python relies on threading and asyncio, which is affected by GIL, but provides multiple concurrency methods. The choice should be based on specific needs.

See all articles