so fast! Recognize video speech into text in just a few minutes with less than 10 lines of code-AI-php.cn

Table of Contents

Whisper

Fast-Whisper

What can I do?

Client

Server

Home

Technology peripherals

so fast! Recognize video speech into text in just a few minutes with less than 10 lines of code

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Feb 27, 2024 pm 01:55 PM

Tik Tok Speech Recognition pytorch rust

so fast! Recognize video speech into text in just a few minutes with less than 10 lines of code

Hello everyone, I am Kite

Two years ago, the need to convert audio and video files into text content was difficult to achieve. But now it can be easily solved in just a few minutes.

It is said that in order to obtain training data, some companies have fully crawled videos on short video platforms such as Douyin and Kuaishou, and then extracted the audio from the videos and converted them into text form for use as big data Model training corpus.

If you need to convert video or audio files to text, you can try this open source solution available today. For example, you can search for the specific time points when dialogues in film and television programs appear.

Without further ado, let’s get to the point.

Whisper

This solution is OpenAI’s open source Whisper. Of course it is written in Python. You only need to simply install a few packages and write a few lines of code. Wait for a while (depending on the performance of your machine and the length of the audio and video), the final text content will come out, it's that simple.

GitHub warehouse address: https://github.com/openai/whisper

Fast-Whisper

Although it has been quite simplified, for the program It is still not streamlined enough for the staff. After all, programmers tend to prefer simplicity and efficiency. Although it is relatively easy to install and call Whisper, you still need to install PyTorch, ffmpeg, and even Rust separately.

Therefore, Fast-Whisper came into being, which is faster and more concise than Whisper. Fast-Whisper is not just a simple encapsulation of Whisper, but a reconstruction of OpenAI's Whisper model by using CTranslate2. CTranslate2 is an efficient inference engine for the Transformer model.

To summarize, it is faster than Whisper. The official statement is that it is 4-8 times faster than Whisper. Not only does it support GPU, but it also supports CPU, and even my broken Mac can be used.

GitHub warehouse address: https://github.com/SYSTRAN/faster-whisper

It only takes two steps to use.

Install dependency packages

pip install faster-whisper

Copy after login

Write code,

from faster_whisper import WhisperModelmodel_size = "large-v3"# Run on GPU with FP16model = WhisperModel(model_size, device="cuda", compute_type="float16")# or run on GPU with INT8# model = WhisperModel(model_size, device="cuda", compute_type="int8_float16")# or run on CPU with INT8# model = WhisperModel(model_size, device="cpu", compute_type="int8")segments, info = model.transcribe("audio.mp3", beam_size=5)print("Detected language '%s' with probability %f" % (info.language, info.language_probability))for segment in segments:print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))

Copy after login

Yes, it's that simple.

What can I do?

It happens that a friend wants to make short videos and post some chicken soup literature videos. Chicken Soup comes from interviews with some famous people. However, he didn't want to watch the entire video again, he just wanted to use the fastest way to get the text content, and then read the text, because reading text is much faster than watching a video, and it can also be searched.

Let me just say, if you don’t even have the piety to watch a complete video, how can you manage an account well?

So I made one for him, using Fast-Whisper.

Client

The client uses Swift and only supports Mac.

Select a video;
Then click "Extract Text", then the Python interface will be called, and you need to wait for a while;
Load the parsed text As well as the start and end times that appear;
Select a start time and an end event;
Click the "Export" button, and the video clip will be exported;

, duration 00:10

Server

The server is, of course, Python, and then it is packaged with Flask and the interface is open to the outside world.

from flask import Flask, request, jsonifyfrom faster_whisper import WhisperModelapp = Flask(__name__)model_size = "large-v2"model = WhisperModel(model_size, device="cpu", compute_type="int8")@app.route('/transcribe', methods=['POST'])def transcribe():# Get the file path from the requestfile_path = request.json.get('filePath')# Transcribe the filesegments, info = model.transcribe(file_path, beam_size=5, initial_prompt="简体")segments_copy = []with open('segments.txt', 'w') as file:for segment in segments:line = "%.2fs|%.2fs|[%.2fs -> %.2fs]|%s" % (segment.start, segment.end, segment.start, segment.end, segment.text)segments_copy.append(line)file.write(line + '\n')# Prepare the responseresponse_data = {"language": info.language,"language_probability": info.language_probability,"segments": []}for segment in segments_copy:response_data["segments"].append(segment)return jsonify(response_data)if __name__ == '__main__':app.run(debug=False)

Copy after login

The above is the detailed content of so fast! Recognize video speech into text in just a few minutes with less than 10 lines of code. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Assassin's Creed Shadows: Seashell Riddle Solution

3 weeks ago By DDD

What's New in Windows 11 KB5054979 & How to Fix Update Issues

2 weeks ago By DDD

Where to find the Crane Control Keycard in Atomfall

3 weeks ago By DDD

Assassin's Creed Shadows - How To Find The Blacksmith And Unlock Weapon And Armour Customisation

4 weeks ago By DDD

Roblox: Dead Rails - How To Complete Every Challenge

3 weeks ago By DDD

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Where is the login entrance for gmail email?

7579

CakePHP Tutorial

1386

What is the format of the account name of steam

win11 activation key permanent

nyt connections hints and answers

111

Related knowledge

A complete collection of expression packs of foreign women Jul 15, 2024 pm 05:48 PM

What are the emoticons of foreign women? Recently, a foreign woman's emoticon package has become very popular on the Internet. I believe many friends will encounter it when watching videos. Below, the editor will share with you some corresponding emoticon packages. If you are interested, come and take a look. A complete collection of expression packs of foreign women

Bytedance Cutting launches SVIP super membership: 499 yuan for continuous annual subscription, providing a variety of AI functions Jun 28, 2024 am 03:51 AM

This site reported on June 27 that Jianying is a video editing software developed by FaceMeng Technology, a subsidiary of ByteDance. It relies on the Douyin platform and basically produces short video content for users of the platform. It is compatible with iOS, Android, and Windows. , MacOS and other operating systems. Jianying officially announced the upgrade of its membership system and launched a new SVIP, which includes a variety of AI black technologies, such as intelligent translation, intelligent highlighting, intelligent packaging, digital human synthesis, etc. In terms of price, the monthly fee for clipping SVIP is 79 yuan, the annual fee is 599 yuan (note on this site: equivalent to 49.9 yuan per month), the continuous monthly subscription is 59 yuan per month, and the continuous annual subscription is 499 yuan per year (equivalent to 41.6 yuan per month) . In addition, the cut official also stated that in order to improve the user experience, those who have subscribed to the original VIP

I worship you, I worship you, a complete list of emoticons Jul 15, 2024 am 11:25 AM

What are some of the emoticons of "I worship you, I worship you"? The expression pack "I worship you, I worship you" originated from the "Big Brother and Little Brother Series" created by the online blogger He Diudiu Buchuudi. In this series, the elder brother helps the younger brother in time when he faces difficulties, and then the younger brother will use this line to express The extreme admiration and gratitude have formed a funny and respectful Internet meme. Let’s follow the editor to enjoy the emoticons. I worship you, I worship you, a complete list of emoticons

I have been honest and asked to let go of the meme introduction. Jul 17, 2024 am 05:44 AM

What does it mean to be honest and let go? As an Internet buzzword, "I've been honest and begging to be let go" originated from a series of humorous discussions about rising commodity prices. This expression is now mostly used in self-deprecation or ridicule situations, meaning that individuals face specific situations (such as pressure, When you are teasing or joking), you feel that you are unable to resist or argue. Let’s follow the editor to see the introduction of this meme. Source of introduction to the meme of "Already Begging to Let It Go": "Already Begging to Let It Go" originated from "If you add a punctual treasure, you will be honest", and later evolved into "If Liqun goes up by two yuan, you will be honest" and "Iced black tea will go up by one yuan. Be honest." Netizens shouted "I have been honest and asked for a price reduction", which eventually developed into "I have been honest and asked to be let go" and an emoticon package was born. Usage: Used when breaking defense, or when you have no choice, or even for yourself

Introduction to the meaning of red warm terrier Jul 12, 2024 pm 03:39 PM

What is red temperature? The red-warm meme originated from the e-sports circle, specifically referring to the phenomenon of former "League of Legends" professional player Uzi's face turning red when he is nervous or excited during the game. It has become an interesting expression on the Internet to describe people's faces turning red due to excitement and anxiety. The following is Let’s follow the editor to see the detailed introduction of this meme. Introduction to the meaning of the Hongwen meme "Red Wen" as an Internet meme originated from the live broadcast culture in the field of e-sports, especially the community related to "League of Legends" (League of Legends). This meme was originally used to describe a characteristic phenomenon of former professional player Uzi (Jian proudly) in the game. When Uzi is playing, his face will become extremely rosy due to nervousness, concentration or emotion. This state is jokingly likened to the in-game hero "Rambo" by the audience.

System76 tips Fedora Cosmic spin for 2025 release with Fedora 42 Aug 01, 2024 pm 09:54 PM

System76 has made waves recently with its Cosmic desktop environment, which is slated to launch with the next major alpha build of Pop!_OS on August 8. However, a recent post on X by System76 CEO, Carl Richell, has tipped that the Cosmic DE developer

Why is there no air conditioner in the dormitory? Jul 11, 2024 pm 07:36 PM

Why is there no air conditioner in the dormitory? The Internet meme "Where is the air conditioning in the dormitory?" originated from the humorous complaints made by students about the lack of air conditioning in dormitories. Through exaggeration and self-deprecation, it expresses the desire for a cool and comfortable environment in the hot summer and the realistic conditions. The contrast, let’s follow the editor to take a look at the introduction of this meme. Where is the air conditioning in the dormitory? The origin of the meme: "Where is the air conditioning in the dormitory?" This meme comes from a ridicule of campus life, especially for those school dormitories with relatively basic accommodation conditions and no air conditioning. It reflects students' desire for improved accommodation conditions, especially the need for air conditioning during the hot summer months. This meme is circulated on the Internet and is often used in communication between students to humorously express frustration and frustration with the lack of air conditioning in hot weather.

Because he is good at introductions Jul 16, 2024 pm 08:59 PM

What does it mean because he is good at stalking? I believe that many friends have seen such a comment in many short video comment areas. So what does it mean because he is good? Today, the editor has brought you an introduction to the meme "because he is good". For those who don’t know yet, come and take a look. The origin of the meme “because he is good”: The meme “because he is good” originated from the Internet, especially a popular meme on short video platforms such as Douyin, and is related to a joke by the well-known cross talk actor Guo Degang. In this paragraph, Guo Degang listed several reasons not to do something in a humorous way. Each reason ended with "because he is good", forming a humorous logical closed loop. In fact, there is no direct causal relationship. , but a nonsensical and funny expression. Hot memes: For example, “I can’t do it

See all articles