Video understanding or video insights are crucial across various industries and applications due to their multifaceted benefits. They enhance content analysis and management by automatically generating metadata, categorizing content, and making videos more searchable. Moreover, video insights provide critical data that drive decision-making, enhance user experiences, and improve operational efficiencies across diverse sectors.
Google’s Gemini 1.5 model brings significant advancements to this field. Beyond its impressive improvements in language processing, this model can handle an enormous input context of up to 1 million tokens. To further its capabilities, Gemini 1.5 is trained as a multimodal model, natively processing text, images, audio, and video. This powerful combination of varied input types and extensive context size opens up new possibilities for processing long videos effectively.
In this article, we will dive into how Gemini 1.5 can be leveraged for generating valuable video insights, transforming the way we understand and utilize video content across different domains.
Google’s Gemini 1.5 represents a significant leap forward in AI performance and efficiency. Building upon extensive research and engineering innovations, this model features a new Mixture-of-Experts (MoE) architecture, enhancing both training and serving efficiency. Available in public preview, Gemini 1.5 Pro and 1.5 Flash offer an impressive 1 million token context window through Google AI Studio and Vertex AI.
Google Gemini updates: Flash 1.5, Gemma 2 and Project Astra (blog.google)
The 1.5 Flash model, the newest addition to the Gemini family, is the fastest and most optimized for high-volume, high-frequency tasks. It is designed for cost-efficiency and excels in applications such as summarization, chat, image and video captioning, and extracting data from extensive documents and tables. With these advancements, Gemini 1.5 sets a new standard for performance and versatility in AI models.
python -m venv venv source venv/bin/activate #for ubuntu venv/Scripts/activate #for windows
pip install google-generativeai streamlit python-dotenv
To access the Gemini API and begin working with its functionalities, you can acquire a free Google API Key by registering with Google AI Studio. Google AI Studio, offered by Google, provides a user-friendly, visual-based interface for interacting with the Gemini API. Within Google AI Studio, you can seamlessly engage with Generative Models through its intuitive UI, and if desired, generate an API Token for enhanced control and customization.
Follow the steps to generate a Gemini API key:
Begin by creating a new folder for your project. Choose a name that reflects the purpose of your project.
Inside your new project folder, create a file named .env. This file will store your environment variables, including your Gemini API key.
Open the .env file and add the following code to specify your Gemini API key:
GOOGLE_API_KEY=AIzaSy......
To get started with your project and ensure you have all the necessary tools, you need to import several key libraries as follows.
import os import time import google.generativeai as genai import streamlit as st from dotenv import load_dotenv
To set up your project, you need to configure the API key and create a directory for temporary file storage for uploaded files.
Define the media folder and configure the Gemini API key by initializing the necessary settings. Add the following code to your script:
python -m venv venv source venv/bin/activate #for ubuntu venv/Scripts/activate #for windows
To store uploaded files in the media folder and return their paths, define a method called save_uploaded_file and add the following code to it.
pip install google-generativeai streamlit python-dotenv
Generating insights from videos involves several crucial stages, including uploading, processing, and response generation.
The Gemini API directly accepts video file formats. The File API supports files up to 2GB in size and allows storage of up to 20GB per project. Uploaded files remain available for 2 days and cannot be downloaded from the API.
GOOGLE_API_KEY=AIzaSy......
After uploading a file, you can verify that the API has successfully received it by using the files.get method. This method allows you to view the files uploaded to the File API that are associated with the Cloud project linked to your API key. Only the file name and the URI are unique identifiers.
import os import time import google.generativeai as genai import streamlit as st from dotenv import load_dotenv
After the video has been uploaded, you can make GenerateContent requests that reference the File API URI.
MEDIA_FOLDER = 'medias' def __init__(): # Create the media directory if it doesn't exist if not os.path.exists(MEDIA_FOLDER): os.makedirs(MEDIA_FOLDER) # Load environment variables from the .env file load_dotenv() # Retrieve the API key from the environment variables api_key = os.getenv("GEMINI_API_KEY") # Configure the Gemini API with your API key genai.configure(api_key=api_key)
Files are automatically deleted after 2 days or you can manually delete them using files.delete().
def save_uploaded_file(uploaded_file): """Save the uploaded file to the media folder and return the file path.""" file_path = os.path.join(MEDIA_FOLDER, uploaded_file.name) with open(file_path, 'wb') as f: f.write(uploaded_file.read()) return file_path
Create a method called get_insights and add the following code to it. Instead print(), use streamlit write() method to see the messages on the website.
video_file = genai.upload_file(path=video_path)
To streamline the process of uploading videos and generating insights within a Streamlit app, you can create a method named app. This method will provide an upload button, display the uploaded video, and generate insights from it.
import time while video_file.state.name == "PROCESSING": print('Waiting for video to be processed.') time.sleep(10) video_file = genai.get_file(video_file.name) if video_file.state.name == "FAILED": raise ValueError(video_file.state.name)
To create a complete and functional Streamlit application that allows users to upload videos and generate insights using the Gemini 1.5 Flash model, combine all the components into a single file named app.py.
Here is the final code:
# Create the prompt. prompt = "Describe the video. Provides the insights from the video." # Set the model to Gemini 1.5 Flash. model = genai.GenerativeModel(model_name="models/gemini-1.5-flash") # Make the LLM request. print("Making LLM inference request...") response = model.generate_content([prompt, video_file], request_options={"timeout": 600}) print(response.text)
Execute the following code to run the application.
genai.delete_file(video_file.name)
You can open the link provided in the console to see the output.
Thanks for reading this article !!
If you enjoyed this article, please click on the heart button ♥ and share to help others find it!
The full source code for this tutorial can be found here,
GitHub - codemaker2015/video-insights-generator
The above is the detailed content of Building a video insights generator using Gemini Flash. For more information, please follow other related articles on the PHP Chinese website!