


so fast! Recognize video speech into text in just a few minutes with less than 10 lines of code
Hello everyone, I am Kite
Two years ago, the need to convert audio and video files into text content was difficult to achieve. But now it can be easily solved in just a few minutes.
It is said that in order to obtain training data, some companies have fully crawled videos on short video platforms such as Douyin and Kuaishou, and then extracted the audio from the videos and converted them into text form for use as big data Model training corpus.
If you need to convert video or audio files to text, you can try this open source solution available today. For example, you can search for the specific time points when dialogues in film and television programs appear.
Without further ado, let’s get to the point.
Whisper
This solution is OpenAI’s open source Whisper. Of course it is written in Python. You only need to simply install a few packages and write a few lines of code. Wait for a while (depending on the performance of your machine and the length of the audio and video), the final text content will come out, it's that simple.
GitHub warehouse address: https://github.com/openai/whisper
Fast-Whisper
Although it has been quite simplified, for the program It is still not streamlined enough for the staff. After all, programmers tend to prefer simplicity and efficiency. Although it is relatively easy to install and call Whisper, you still need to install PyTorch, ffmpeg, and even Rust separately.
Therefore, Fast-Whisper came into being, which is faster and more concise than Whisper. Fast-Whisper is not just a simple encapsulation of Whisper, but a reconstruction of OpenAI's Whisper model by using CTranslate2. CTranslate2 is an efficient inference engine for the Transformer model.
To summarize, it is faster than Whisper. The official statement is that it is 4-8 times faster than Whisper. Not only does it support GPU, but it also supports CPU, and even my broken Mac can be used.
GitHub warehouse address: https://github.com/SYSTRAN/faster-whisper
It only takes two steps to use.
- Install dependency packages
pip install faster-whisper
- Write code,
from faster_whisper import WhisperModelmodel_size = "large-v3"# Run on GPU with FP16model = WhisperModel(model_size, device="cuda", compute_type="float16")# or run on GPU with INT8# model = WhisperModel(model_size, device="cuda", compute_type="int8_float16")# or run on CPU with INT8# model = WhisperModel(model_size, device="cpu", compute_type="int8")segments, info = model.transcribe("audio.mp3", beam_size=5)print("Detected language '%s' with probability %f" % (info.language, info.language_probability))for segment in segments:print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))
Yes, it's that simple.
What can I do?
It happens that a friend wants to make short videos and post some chicken soup literature videos. Chicken Soup comes from interviews with some famous people. However, he didn't want to watch the entire video again, he just wanted to use the fastest way to get the text content, and then read the text, because reading text is much faster than watching a video, and it can also be searched.
Let me just say, if you don’t even have the piety to watch a complete video, how can you manage an account well?
So I made one for him, using Fast-Whisper.
Client
The client uses Swift and only supports Mac.
- Select a video;
- Then click "Extract Text", then the Python interface will be called, and you need to wait for a while;
- Load the parsed text As well as the start and end times that appear;
- Select a start time and an end event;
- Click the "Export" button, and the video clip will be exported;
, duration 00:10
Server
The server is, of course, Python, and then it is packaged with Flask and the interface is open to the outside world.
from flask import Flask, request, jsonifyfrom faster_whisper import WhisperModelapp = Flask(__name__)model_size = "large-v2"model = WhisperModel(model_size, device="cpu", compute_type="int8")@app.route('/transcribe', methods=['POST'])def transcribe():# Get the file path from the requestfile_path = request.json.get('filePath')# Transcribe the filesegments, info = model.transcribe(file_path, beam_size=5, initial_prompt="简体")segments_copy = []with open('segments.txt', 'w') as file:for segment in segments:line = "%.2fs|%.2fs|[%.2fs -> %.2fs]|%s" % (segment.start, segment.end, segment.start, segment.end, segment.text)segments_copy.append(line)file.write(line + '\n')# Prepare the responseresponse_data = {"language": info.language,"language_probability": info.language_probability,"segments": []}for segment in segments_copy:response_data["segments"].append(segment)return jsonify(response_data)if __name__ == '__main__':app.run(debug=False)
The above is the detailed content of so fast! Recognize video speech into text in just a few minutes with less than 10 lines of code. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics

What are the emoticons of foreign women? Recently, a foreign woman's emoticon package has become very popular on the Internet. I believe many friends will encounter it when watching videos. Below, the editor will share with you some corresponding emoticon packages. If you are interested, come and take a look. A complete collection of expression packs of foreign women

This site reported on June 27 that Jianying is a video editing software developed by FaceMeng Technology, a subsidiary of ByteDance. It relies on the Douyin platform and basically produces short video content for users of the platform. It is compatible with iOS, Android, and Windows. , MacOS and other operating systems. Jianying officially announced the upgrade of its membership system and launched a new SVIP, which includes a variety of AI black technologies, such as intelligent translation, intelligent highlighting, intelligent packaging, digital human synthesis, etc. In terms of price, the monthly fee for clipping SVIP is 79 yuan, the annual fee is 599 yuan (note on this site: equivalent to 49.9 yuan per month), the continuous monthly subscription is 59 yuan per month, and the continuous annual subscription is 499 yuan per year (equivalent to 41.6 yuan per month) . In addition, the cut official also stated that in order to improve the user experience, those who have subscribed to the original VIP

What are some of the emoticons of "I worship you, I worship you"? The expression pack "I worship you, I worship you" originated from the "Big Brother and Little Brother Series" created by the online blogger He Diudiu Buchuudi. In this series, the elder brother helps the younger brother in time when he faces difficulties, and then the younger brother will use this line to express The extreme admiration and gratitude have formed a funny and respectful Internet meme. Let’s follow the editor to enjoy the emoticons. I worship you, I worship you, a complete list of emoticons

What does it mean to be honest and let go? As an Internet buzzword, "I've been honest and begging to be let go" originated from a series of humorous discussions about rising commodity prices. This expression is now mostly used in self-deprecation or ridicule situations, meaning that individuals face specific situations (such as pressure, When you are teasing or joking), you feel that you are unable to resist or argue. Let’s follow the editor to see the introduction of this meme. Source of introduction to the meme of "Already Begging to Let It Go": "Already Begging to Let It Go" originated from "If you add a punctual treasure, you will be honest", and later evolved into "If Liqun goes up by two yuan, you will be honest" and "Iced black tea will go up by one yuan. Be honest." Netizens shouted "I have been honest and asked for a price reduction", which eventually developed into "I have been honest and asked to be let go" and an emoticon package was born. Usage: Used when breaking defense, or when you have no choice, or even for yourself

What is red temperature? The red-warm meme originated from the e-sports circle, specifically referring to the phenomenon of former "League of Legends" professional player Uzi's face turning red when he is nervous or excited during the game. It has become an interesting expression on the Internet to describe people's faces turning red due to excitement and anxiety. The following is Let’s follow the editor to see the detailed introduction of this meme. Introduction to the meaning of the Hongwen meme "Red Wen" as an Internet meme originated from the live broadcast culture in the field of e-sports, especially the community related to "League of Legends" (League of Legends). This meme was originally used to describe a characteristic phenomenon of former professional player Uzi (Jian proudly) in the game. When Uzi is playing, his face will become extremely rosy due to nervousness, concentration or emotion. This state is jokingly likened to the in-game hero "Rambo" by the audience.

System76 has made waves recently with its Cosmic desktop environment, which is slated to launch with the next major alpha build of Pop!_OS on August 8. However, a recent post on X by System76 CEO, Carl Richell, has tipped that the Cosmic DE developer

What does it mean because he is good at stalking? I believe that many friends have seen such a comment in many short video comment areas. So what does it mean because he is good? Today, the editor has brought you an introduction to the meme "because he is good". For those who don’t know yet, come and take a look. The origin of the meme “because he is good”: The meme “because he is good” originated from the Internet, especially a popular meme on short video platforms such as Douyin, and is related to a joke by the well-known cross talk actor Guo Degang. In this paragraph, Guo Degang listed several reasons not to do something in a humorous way. Each reason ended with "because he is good", forming a humorous logical closed loop. In fact, there is no direct causal relationship. , but a nonsensical and funny expression. Hot memes: For example, “I can’t do it

Why is there no air conditioner in the dormitory? The Internet meme "Where is the air conditioning in the dormitory?" originated from the humorous complaints made by students about the lack of air conditioning in dormitories. Through exaggeration and self-deprecation, it expresses the desire for a cool and comfortable environment in the hot summer and the realistic conditions. The contrast, let’s follow the editor to take a look at the introduction of this meme. Where is the air conditioning in the dormitory? The origin of the meme: "Where is the air conditioning in the dormitory?" This meme comes from a ridicule of campus life, especially for those school dormitories with relatively basic accommodation conditions and no air conditioning. It reflects students' desire for improved accommodation conditions, especially the need for air conditioning during the hot summer months. This meme is circulated on the Internet and is often used in communication between students to humorously express frustration and frustration with the lack of air conditioning in hot weather.
