Building a Chrome extension that leverages AI technologies can significantly enhance user experience by adding powerful features directly into the browser.
In this tutorial, we'll cover the entire process of building a Chrome extension from scratch with AI/ML API, Deepgram Aura, and IndexDB, from setup to deployment. We'll start by setting up our development environment, including installing necessary tools and configuring our project. Then, we'll dive into the core components of our Chrome extension: manifest.json contains basic metadata about your extension, scripts.js responsible how our extension will behave, and styles.css to add some styling. We'll explore how integrate these technologies with Deepgram Aura through AI/ML API, and use IndexDB as temporary storage for generated audio file. Along the way, we'll discuss best practices for building Chrome extension, handling user queries, and saving data in the database. By the end of this tutorial, you'll have a solid foundation in building Chrome extension and be well-equipped to build any AI-powered Chrome extension.
Let's get a brief overview of technologies we are going to utilize.
AI/ML API is a game-changing platform for developers and SaaS entrepreneurs looking to integrate cutting-edge AI capabilities into their products. AI/ML API offers a single point of access to over 200 state-of-the-art AI models, covering everything from NLP to computer vision.
Key Features for Developers:
Deep Dive into AI/ML API Documentation; https://docs.aimlapi.com/
Chrome extension is a small software program that modifies or enhances the functionality of the Google Chrome web browser. These extensions are built using web technologies such as HTML, CSS, and JavaScript, and are designed to serve a single purpose, making them easy to understand and use.
Browse Chrome Web Store; https://chromewebstore.google.com/
Deepgram Aura is the first text-to-speech (TTS) AI model designed for real-time, conversational AI agents and applications. It delivers human-like voice quality with unparalleled speed and efficiency, making it a game-changer for building responsive, high-throughput voice AI experiences.
Learn more about technical details; https://aimlapi.com/models/aura
IndexedDB is a low-level API for client-side storage of significant amounts of structured data, including files/blobs. IndexedDB is a JavaScript-based object-oriented database.
Learn more about key concepts and usage; https://developer.mozilla.org/en-US/docs/Web/API/IndexedDB_API
Building a Chrome extension involves understanding its structure, permissions, and how it interacts with web pages. We'll start by setting up our development environment and creating the foundational files required for our extension.
Before we begin coding, ensure you have the following:
A minimal Chrome extension requires at least three files:
Let's create a directory for our project and set up these files.
Step 1: Create a New Directory
Open your terminal and run the following commands to create a new folder for your extension:
mkdir my-first-chrome-extension cd my-first-chrome-extension
Step 2: Create Essential Files
Within the new directory, create the necessary files:
touch manifest.json touch scripts.js touch styles.css
The manifest.json file is the heart of your Chrome extension. It tells the browser about your extension, what it does, and what permissions it needs. Let's delve into configuring this file properly.
{ "manifest_version": 3, "name": "Read Aloud", "version": "1.0", "description": "Read Aloud anything in any tab", "host_permissions": [ "*://*.aimlapi.com/*" ], "permissions": [ "activeTab" ], "content_scripts": [ { "matches": ["<all_urls>"], "js": ["scripts.js"], "css": ["styles.css"] } ], "icons": { "16": "icons/icon.png", "48": "icons/icon.png", "128": "icons/icon.png" } }
At a minimum, manifest.json must include:
Beyond the essential fields, we'll add:
Open your browser and go to chatgpt.com. Now let's generate icon for our Chrome extension. We'll use one icon for different sizes (it's totally ok).
Enter the following prompt:
Generate black and white icon for my "Read Aloud" Chrome extension. This extension allows users to highlight the specific text in the website and listen to it. It's AI-powered Chrome extension. The background should be in white and solid.
Wait a couple of seconds until ChatGPT generates the icon (image). Click download and rename it to icon.png. Then put inside icons folder.
With all fields properly defined, your manifest.json will enable browser to understand and correctly load your extension.
The scripts.js file contains the logic that controls how your extension behaves. We'll outline the key functionalities your script needs to implement.
Start by setting up necessary variables:
mkdir my-first-chrome-extension cd my-first-chrome-extension
Your extension should detect when a user selects text on a webpage:
mkdir my-first-chrome-extension cd my-first-chrome-extension
touch manifest.json touch scripts.js touch styles.css
{ "manifest_version": 3, "name": "Read Aloud", "version": "1.0", "description": "Read Aloud anything in any tab", "host_permissions": [ "*://*.aimlapi.com/*" ], "permissions": [ "activeTab" ], "content_scripts": [ { "matches": ["<all_urls>"], "js": ["scripts.js"], "css": ["styles.css"] } ], "icons": { "16": "icons/icon.png", "48": "icons/icon.png", "128": "icons/icon.png" } }
// Set your AIML_API_KEY key const AIML_API_KEY = ''; // Replace with your AIML_API_KEY key // Create the overlay const overlay = document.createElement('div'); overlay.id = 'read-aloud-overlay'; // Create the "Read Aloud" button const askButton = document.createElement('button'); askButton.id = 'read-aloud-button'; askButton.innerText = 'Read Aloud'; // Append the button to the overlay overlay.appendChild(askButton); // Variables to store selected text and range let selectedText = ''; let selectedRange = null;
document.addEventListener('mouseup', (event) => { console.log('mouseup event: ', event); //...code }
When the user clicks the "Read Aloud" button:
const selection = window.getSelection(); const text = selection.toString().trim(); if (text !== '') { const range = selection.getRangeAt(0); const rect = range.getBoundingClientRect();
// Set the position of the overlay overlay.style.top = `${window.scrollY + rect.top - 50}px`; // Adjust as needed overlay.style.left = `${window.scrollX + rect.left + rect.width / 2 - 70}px`; // Adjust to center the overlay selectedText = text; selectedRange = range;
// Remove existing overlay if any const existingOverlay = document.getElementById('read-aloud-overlay'); if (existingOverlay) { existingOverlay.remove(); } // Append the overlay to the document body document.body.appendChild(overlay); } else { // Remove overlay if no text is selected const existingOverlay = document.getElementById('read-aloud-overlay'); if (existingOverlay) { existingOverlay.remove(); } }
// Function to handle text selection document.addEventListener('mouseup', (event) => { console.log('mouseup event: ', event); const selection = window.getSelection(); const text = selection.toString().trim(); if (text !== '') { const range = selection.getRangeAt(0); const rect = range.getBoundingClientRect(); // Set the position of the overlay overlay.style.top = `${window.scrollY + rect.top - 50}px`; // Adjust as needed overlay.style.left = `${window.scrollX + rect.left + rect.width / 2 - 70}px`; // Adjust to center the overlay selectedText = text; selectedRange = range; // Remove existing overlay if any const existingOverlay = document.getElementById('read-aloud-overlay'); if (existingOverlay) { existingOverlay.remove(); } // Append the overlay to the document body document.body.appendChild(overlay); } else { // Remove overlay if no text is selected const existingOverlay = document.getElementById('read-aloud-overlay'); if (existingOverlay) { existingOverlay.remove(); } } });
if (selectedText.length > 200) { // ...code }
To manage audio files efficiently:
// Disable the button askButton.disabled = true; askButton.innerText = 'Loading...';
// Send the selected text to your AI/ML API for TTS const response = await fetch('https://api.aimlapi.com/tts', { method: 'POST', headers: { 'Content-Type': 'application/json', 'Authorization': `Bearer ${AIML_API_KEY}`, // Replace with your actual API key }, body: JSON.stringify({ model: '#g1_aura-asteria-en', // Replace with your specific model if needed text: selectedText }) });
try { // ...code if (!response.ok) { throw new Error('API request failed'); } // ...code } catch (error) { console.error('Error:', error); askButton.disabled = false; askButton.innerText = 'Read Aloud'; alert('An error occurred while fetching the audio.'); }
// Play the audio audio.play();
// Open IndexedDB const db = await openDatabase(); const audioId = 'audio_' + Date.now(); // Generate a unique ID for the audio
// Save audio blob to IndexedDB await saveAudioToIndexedDB(db, audioId, audioBlob);
IndexedDB is a powerful client-side storage system that allows us to store large amounts of data, including files and blobs.
You'll need to create four primary functions to interact with IndexedDB:
mkdir my-first-chrome-extension cd my-first-chrome-extension
touch manifest.json touch scripts.js touch styles.css
{ "manifest_version": 3, "name": "Read Aloud", "version": "1.0", "description": "Read Aloud anything in any tab", "host_permissions": [ "*://*.aimlapi.com/*" ], "permissions": [ "activeTab" ], "content_scripts": [ { "matches": ["<all_urls>"], "js": ["scripts.js"], "css": ["styles.css"] } ], "icons": { "16": "icons/icon.png", "48": "icons/icon.png", "128": "icons/icon.png" } }
// Set your AIML_API_KEY key const AIML_API_KEY = ''; // Replace with your AIML_API_KEY key // Create the overlay const overlay = document.createElement('div'); overlay.id = 'read-aloud-overlay'; // Create the "Read Aloud" button const askButton = document.createElement('button'); askButton.id = 'read-aloud-button'; askButton.innerText = 'Read Aloud'; // Append the button to the overlay overlay.appendChild(askButton); // Variables to store selected text and range let selectedText = ''; let selectedRange = null;
To provide a seamless user experience, your extension should have a clean and intuitive interface.
Define styles for:
document.addEventListener('mouseup', (event) => { console.log('mouseup event: ', event); //...code }
const selection = window.getSelection(); const text = selection.toString().trim(); if (text !== '') { const range = selection.getRangeAt(0); const rect = range.getBoundingClientRect();
// Set the position of the overlay overlay.style.top = `${window.scrollY + rect.top - 50}px`; // Adjust as needed overlay.style.left = `${window.scrollX + rect.left + rect.width / 2 - 70}px`; // Adjust to center the overlay selectedText = text; selectedRange = range;
// Remove existing overlay if any const existingOverlay = document.getElementById('read-aloud-overlay'); if (existingOverlay) { existingOverlay.remove(); } // Append the overlay to the document body document.body.appendChild(overlay); } else { // Remove overlay if no text is selected const existingOverlay = document.getElementById('read-aloud-overlay'); if (existingOverlay) { existingOverlay.remove(); } }
To interact with the AI/ML API and Deepgram Aura model, you'll need an API key.
mkdir my-first-chrome-extension cd my-first-chrome-extension
Now put your API Key:
touch manifest.json touch scripts.js touch styles.css
But it won't work instantly. Using .env in Chrome extensions requires other extra configurations. We'll talk about this in upcoming tutorials.
{ "manifest_version": 3, "name": "Read Aloud", "version": "1.0", "description": "Read Aloud anything in any tab", "host_permissions": [ "*://*.aimlapi.com/*" ], "permissions": [ "activeTab" ], "content_scripts": [ { "matches": ["<all_urls>"], "js": ["scripts.js"], "css": ["styles.css"] } ], "icons": { "16": "icons/icon.png", "48": "icons/icon.png", "128": "icons/icon.png" } }
With all components in place, it's time to load your extension into Chrome browser and see it in action.
Enable Developer Mode: Toggle the "Developer mode" switch in the top right corner.
In this tutorial, we've:
With a solid foundation, you can enhance your extension further:
Congratulations on building a Chrome extension that integrates advanced AI capabilities! This project showcases how combining web technologies with powerful APIs can create engaging and accessible user experiences. You're now equipped with the knowledge to develop and expand upon this extension or create entirely new ones that leverage AI/ML APIs.
Full implementation available on Github; https://github.com/TechWithAbee/Building-a-Chrome-Extension-from-Scratch-with-AI-ML-API-Deepgram-Aura-and-IndexDB-Integration
Should you have any questions or need further assistance, don't hesitate to reach out via email at abdibrokhim@gmail.com.
The above is the detailed content of Building a Chrome Extension from Scratch with AI/ML API, Deepgram Aura, and IndexedDB Integration. For more information, please follow other related articles on the PHP Chinese website!