Home > Web Front-end > JS Tutorial > body text

Building a Chrome Extension from Scratch with AI/ML API, Deepgram Aura, and IndexedDB Integration

Linda Hamilton
Release: 2024-10-26 19:52:03
Original
839 people have browsed it

Introduction

Building a Chrome extension that leverages AI technologies can significantly enhance user experience by adding powerful features directly into the browser.

In this tutorial, we'll cover the entire process of building a Chrome extension from scratch with AI/ML API, Deepgram Aura, and IndexDB, from setup to deployment. We'll start by setting up our development environment, including installing necessary tools and configuring our project. Then, we'll dive into the core components of our Chrome extension: manifest.json contains basic metadata about your extension, scripts.js responsible how our extension will behave, and styles.css to add some styling. We'll explore how integrate these technologies with Deepgram Aura through AI/ML API, and use IndexDB as temporary storage for generated audio file. Along the way, we'll discuss best practices for building Chrome extension, handling user queries, and saving data in the database. By the end of this tutorial, you'll have a solid foundation in building Chrome extension and be well-equipped to build any AI-powered Chrome extension.

Let's get a brief overview of technologies we are going to utilize.

AI/ML API

AI/ML API is a game-changing platform for developers and SaaS entrepreneurs looking to integrate cutting-edge AI capabilities into their products. AI/ML API offers a single point of access to over 200 state-of-the-art AI models, covering everything from NLP to computer vision.

Key Features for Developers:

  • Extensive Model Library: 200 pre-trained models for rapid prototyping and deployment
  • Customization Options: Fine-tune models to fit your specific use case
  • Developer-Friendly Integration: RESTful APIs and SDKs for seamless incorporation into your stack
  • Serverless Architecture: Focus on coding, not infrastructure management

Deep Dive into AI/ML API Documentation; https://docs.aimlapi.com/

Chrome Extension

Chrome extension is a small software program that modifies or enhances the functionality of the Google Chrome web browser. These extensions are built using web technologies such as HTML, CSS, and JavaScript, and are designed to serve a single purpose, making them easy to understand and use.

Browse Chrome Web Store; https://chromewebstore.google.com/

Deepgram Aura

Deepgram Aura is the first text-to-speech (TTS) AI model designed for real-time, conversational AI agents and applications. It delivers human-like voice quality with unparalleled speed and efficiency, making it a game-changer for building responsive, high-throughput voice AI experiences.

Learn more about technical details; https://aimlapi.com/models/aura

IndexDB

IndexedDB is a low-level API for client-side storage of significant amounts of structured data, including files/blobs. IndexedDB is a JavaScript-based object-oriented database.

Learn more about key concepts and usage; https://developer.mozilla.org/en-US/docs/Web/API/IndexedDB_API

Getting Started with Chrome Extension

Building a Chrome extension involves understanding its structure, permissions, and how it interacts with web pages. We'll start by setting up our development environment and creating the foundational files required for our extension.

Setting Up Your Development Environment

Before we begin coding, ensure you have the following:

  • Chrome Browser: The browser where we'll load and test our extension.
  • Text Editor or IDE: Tools like Visual Studio Code, Sublime Text, or Atom are suitable for editing code. We'll use Visual Studio Code in this tutorial.
  • Basic Knowledge of HTML, CSS, and JavaScript: Familiarity with these technologies is essential for building Chrome extensions.

Creating the Project Structure

A minimal Chrome extension requires at least three files:

  • manifest.json: Contains metadata and configuration for the extension.
  • scripts.js: Holds the JavaScript code that defines the extension's behavior.
  • styles.css: Includes any styling for the extension's UI elements.

Let's create a directory for our project and set up these files.
Step 1: Create a New Directory
Open your terminal and run the following commands to create a new folder for your extension:

mkdir my-first-chrome-extension
cd my-first-chrome-extension
Copy after login
Copy after login
Copy after login
Copy after login
Copy after login

Step 2: Create Essential Files
Within the new directory, create the necessary files:

touch manifest.json
touch scripts.js
touch styles.css
Copy after login
Copy after login
Copy after login
Copy after login

Understanding manifest.json

The manifest.json file is the heart of your Chrome extension. It tells the browser about your extension, what it does, and what permissions it needs. Let's delve into configuring this file properly.

{
  "manifest_version": 3,
  "name": "Read Aloud",
  "version": "1.0",
  "description": "Read Aloud anything in any tab",
  "host_permissions": [
    "*://*.aimlapi.com/*"
],
  "permissions": [
      "activeTab"
  ],
  "content_scripts": [
    {
      "matches": ["<all_urls>"],
      "js": ["scripts.js"],
      "css": ["styles.css"]
    }
  ],
  "icons": {
    "16": "icons/icon.png",
    "48": "icons/icon.png",
    "128": "icons/icon.png"
  }
}
Copy after login
Copy after login
Copy after login
Copy after login

Essential Fields in manifest.json

At a minimum, manifest.json must include:

  • manifest_version: Specifies the version of the manifest file format. Chrome currently uses version 3.
  • name: The name of your extension, as it will appear to users.
  • version: The version number of your extension, following semantic versioning.

Adding Metadata and Permissions

Beyond the essential fields, we'll add:

  • description: A brief summary of what your extension does.
  • host_permissions: Specifies which domains the extension can access. For our integration with the AI/ML API, we'll need access to *.aimlapi.com.
  • permissions: Declares special permissions needed, such as accessing the active tab.
  • content_scripts: Defines scripts and styles to inject into web pages.
  • icons: Provides icons for the extension at various sizes.

Explanation of Key Fields

  • manifest_version: Set to 3 to use the latest Chrome extension features.
  • name: We'll name our extension "Read Aloud" reflecting its functionality.
  • version: Starting with "1.0" indicates the initial release.
  • description: "Read Aloud anything in any tab" informs users about the extension's purpose.
  • host_permissions: The wildcard *://*.aimlapi.com/* allows the extension to communicate with any subdomain of aimlapi.com, necessary for API calls.
  • permissions: "activeTab" allows the extension to interact with the content of the current tab.
  • content_scripts: Specifies that scripts.js and styles.css should be injected into all web pages ("").
  • icons: References icon files for the extension (ensure you have appropriate icon files in an icons directory).

Generating icon

Open your browser and go to chatgpt.com. Now let's generate icon for our Chrome extension. We'll use one icon for different sizes (it's totally ok).

Enter the following prompt:

Generate black and white icon for my "Read Aloud" Chrome extension. This extension allows users to highlight the specific text in the website and listen to it. It's AI-powered Chrome extension. The background should be in white and solid.

Wait a couple of seconds until ChatGPT generates the icon (image). Click download and rename it to icon.png. Then put inside icons folder.

Finalizing manifest.json

With all fields properly defined, your manifest.json will enable browser to understand and correctly load your extension.


Developing scripts.js

The scripts.js file contains the logic that controls how your extension behaves. We'll outline the key functionalities your script needs to implement.

Variables and Initialization

Start by setting up necessary variables:

  • API Key: You'll need an API key from the AI/ML API platform to authenticate your requests.
  • Overlay Elements: Create DOM elements for the overlay and the "Read Aloud" button.
  • Selection Variables: Store information about the user's selected text and its position.
mkdir my-first-chrome-extension
cd my-first-chrome-extension
Copy after login
Copy after login
Copy after login
Copy after login
Copy after login

Handling Text Selection

Your extension should detect when a user selects text on a webpage:

  • Event Listener: Attach a mouseup event listener to the document to detect when the user finishes selecting text.
mkdir my-first-chrome-extension
cd my-first-chrome-extension
Copy after login
Copy after login
Copy after login
Copy after login
Copy after login
  • Selection Detection: Check if the selected text is not empty and store it.
touch manifest.json
touch scripts.js
touch styles.css
Copy after login
Copy after login
Copy after login
Copy after login
  • Overlay Positioning: Calculate where to place the overlay so it's near the selected text.
{
  "manifest_version": 3,
  "name": "Read Aloud",
  "version": "1.0",
  "description": "Read Aloud anything in any tab",
  "host_permissions": [
    "*://*.aimlapi.com/*"
],
  "permissions": [
      "activeTab"
  ],
  "content_scripts": [
    {
      "matches": ["<all_urls>"],
      "js": ["scripts.js"],
      "css": ["styles.css"]
    }
  ],
  "icons": {
    "16": "icons/icon.png",
    "48": "icons/icon.png",
    "128": "icons/icon.png"
  }
}
Copy after login
Copy after login
Copy after login
Copy after login
  • Overlay Management: Ensure that any existing overlay is removed before adding a new one.
// Set your AIML_API_KEY key
const AIML_API_KEY = ''; // Replace with your AIML_API_KEY key

// Create the overlay
const overlay = document.createElement('div');
overlay.id = 'read-aloud-overlay';

// Create the "Read Aloud" button
const askButton = document.createElement('button');
askButton.id = 'read-aloud-button';
askButton.innerText = 'Read Aloud';

// Append the button to the overlay
overlay.appendChild(askButton);

// Variables to store selected text and range
let selectedText = '';
let selectedRange = null;
Copy after login
Copy after login

Full Code:

document.addEventListener('mouseup', (event) => {
  console.log('mouseup event: ', event);
  //...code
}
Copy after login
Copy after login

Interacting with the AI/ML API

When the user clicks the "Read Aloud" button:

  • Input Validation: Check if the selected text meets any length requirements.
const selection = window.getSelection();
const text = selection.toString().trim();
if (text !== '') {
  const range = selection.getRangeAt(0);
  const rect = range.getBoundingClientRect();
Copy after login
Copy after login
  • Disable Button: Prevent multiple clicks by disabling the button during processing.
// Set the position of the overlay
overlay.style.top = `${window.scrollY + rect.top - 50}px`; // Adjust as needed
overlay.style.left = `${window.scrollX + rect.left + rect.width / 2 - 70}px`; // Adjust to center the overlay

selectedText = text;
selectedRange = range;
Copy after login
Copy after login
  • API Request: Send a POST request to the AI/ML API with the selected text for text-to-speech conversion.
// Remove existing overlay if any
const existingOverlay = document.getElementById('read-aloud-overlay');
if (existingOverlay) {
  existingOverlay.remove();
}

// Append the overlay to the document body
document.body.appendChild(overlay);
} else {
  // Remove overlay if no text is selected
  const existingOverlay = document.getElementById('read-aloud-overlay');
  if (existingOverlay) {
    existingOverlay.remove();
  }
}
Copy after login
Copy after login
  • Error Handling: Handle any errors that occur during the API request gracefully.
// Function to handle text selection
document.addEventListener('mouseup', (event) => {
  console.log('mouseup event: ', event);
  const selection = window.getSelection();
  const text = selection.toString().trim();
  if (text !== '') {
    const range = selection.getRangeAt(0);
    const rect = range.getBoundingClientRect();

    // Set the position of the overlay
    overlay.style.top = `${window.scrollY + rect.top - 50}px`; // Adjust as needed
    overlay.style.left = `${window.scrollX + rect.left + rect.width / 2 - 70}px`; // Adjust to center the overlay

    selectedText = text;
    selectedRange = range;

    // Remove existing overlay if any
    const existingOverlay = document.getElementById('read-aloud-overlay');
    if (existingOverlay) {
      existingOverlay.remove();
    }

    // Append the overlay to the document body
    document.body.appendChild(overlay);
  } else {
    // Remove overlay if no text is selected
    const existingOverlay = document.getElementById('read-aloud-overlay');
    if (existingOverlay) {
      existingOverlay.remove();
    }
  }
});
Copy after login
  • Audio Playback: Once the audio is received, play it back to the user.
if (selectedText.length > 200) {
// ...code
}
Copy after login

Using IndexedDB for Storage

To manage audio files efficiently:

  • Open Database: Create or open an IndexedDB database to store audio blobs.
// Disable the button
askButton.disabled = true;
askButton.innerText = 'Loading...';
Copy after login
  • Save Audio: Store the audio blob in IndexedDB after receiving it from the API.
// Send the selected text to your AI/ML API for TTS
const response = await fetch('https://api.aimlapi.com/tts', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    'Authorization': `Bearer ${AIML_API_KEY}`, // Replace with your actual API key
  },
  body: JSON.stringify({
    model: '#g1_aura-asteria-en',  // Replace with your specific model if needed
    text: selectedText
  })
});
Copy after login
  • Retrieve Audio: Fetch the audio blob from IndexedDB for playback.
try {

  // ...code

  if (!response.ok) {
    throw new Error('API request failed');
  }

  // ...code

} catch (error) {
  console.error('Error:', error);
  askButton.disabled = false;
  askButton.innerText = 'Read Aloud';
  alert('An error occurred while fetching the audio.');
}
Copy after login
  • Delete Audio: Remove the audio blob from the database after playback to free up space.
// Play the audio
audio.play();
Copy after login

Cleanup and User Experience

  • Overlay Removal: Remove the overlay if the user clicks elsewhere or deselects the text.
// Open IndexedDB
const db = await openDatabase();
const audioId = 'audio_' + Date.now(); // Generate a unique ID for the audio
Copy after login
  • Re-enable Button: Ensure the "Read Aloud" button is re-enabled after processing is complete.
  • User Feedback: Provide visual cues, like changing button text to "Loading…", to inform the user that processing is underway.

Full code:

// Save audio blob to IndexedDB
await saveAudioToIndexedDB(db, audioId, audioBlob);
Copy after login

Implementing IndexedDB Functions

IndexedDB is a powerful client-side storage system that allows us to store large amounts of data, including files and blobs.

Functionality to Implement

You'll need to create four primary functions to interact with IndexedDB:

  • openDatabase(): Opens a connection to the database and creates an object store if it doesn't exist.
mkdir my-first-chrome-extension
cd my-first-chrome-extension
Copy after login
Copy after login
Copy after login
Copy after login
Copy after login
  • saveAudioToIndexedDB(): Saves the audio blob with a unique ID.
touch manifest.json
touch scripts.js
touch styles.css
Copy after login
Copy after login
Copy after login
Copy after login
  • getAudioFromIndexedDB(): Retrieves the audio blob using its ID.
{
  "manifest_version": 3,
  "name": "Read Aloud",
  "version": "1.0",
  "description": "Read Aloud anything in any tab",
  "host_permissions": [
    "*://*.aimlapi.com/*"
],
  "permissions": [
      "activeTab"
  ],
  "content_scripts": [
    {
      "matches": ["<all_urls>"],
      "js": ["scripts.js"],
      "css": ["styles.css"]
    }
  ],
  "icons": {
    "16": "icons/icon.png",
    "48": "icons/icon.png",
    "128": "icons/icon.png"
  }
}
Copy after login
Copy after login
Copy after login
Copy after login
  • deleteAudioFromIndexedDB(): Deletes the audio blob after playback.
// Set your AIML_API_KEY key
const AIML_API_KEY = ''; // Replace with your AIML_API_KEY key

// Create the overlay
const overlay = document.createElement('div');
overlay.id = 'read-aloud-overlay';

// Create the "Read Aloud" button
const askButton = document.createElement('button');
askButton.id = 'read-aloud-button';
askButton.innerText = 'Read Aloud';

// Append the button to the overlay
overlay.appendChild(askButton);

// Variables to store selected text and range
let selectedText = '';
let selectedRange = null;
Copy after login
Copy after login

Key Concepts

  • Transactions: All interactions with IndexedDB occur within a transaction. Ensure you specify the correct transaction mode (readonly or readwrite).
  • Object Stores: Similar to tables in SQL databases, object stores hold the data. We'll use an object store named "audios".
  • Error Handling: Always handle errors for database operations to prevent unexpected behavior.

Styling with styles.css

To provide a seamless user experience, your extension should have a clean and intuitive interface.

Styling the Overlay and Button

Define styles for:

  • Overlay Positioning: Absolute positioning to place the overlay near the selected text.
document.addEventListener('mouseup', (event) => {
  console.log('mouseup event: ', event);
  //...code
}
Copy after login
Copy after login
  • Button Appearance: Styling the "Read Aloud" button to match the overlay and be easily clickable.
const selection = window.getSelection();
const text = selection.toString().trim();
if (text !== '') {
  const range = selection.getRangeAt(0);
  const rect = range.getBoundingClientRect();
Copy after login
Copy after login
  • Hover Effects: Enhance user interaction with hover effects on the button.
// Set the position of the overlay
overlay.style.top = `${window.scrollY + rect.top - 50}px`; // Adjust as needed
overlay.style.left = `${window.scrollX + rect.left + rect.width / 2 - 70}px`; // Adjust to center the overlay

selectedText = text;
selectedRange = range;
Copy after login
Copy after login
  • Disabled State: Visually indicate when the button is disabled.
// Remove existing overlay if any
const existingOverlay = document.getElementById('read-aloud-overlay');
if (existingOverlay) {
  existingOverlay.remove();
}

// Append the overlay to the document body
document.body.appendChild(overlay);
} else {
  // Remove overlay if no text is selected
  const existingOverlay = document.getElementById('read-aloud-overlay');
  if (existingOverlay) {
    existingOverlay.remove();
  }
}
Copy after login
Copy after login

Obtaining and Setting Your API Key

To interact with the AI/ML API and Deepgram Aura model, you'll need an API key.

Steps to Obtain Your API Key

  • Visit the AI/ML API Platform: Navigate to aimlapi.com.

Building a Chrome Extension from Scratch with AI/ML API, Deepgram Aura, and IndexedDB Integration

  • Sign In: Click "Get API Key" and sign in using your Google account.

Building a Chrome Extension from Scratch with AI/ML API, Deepgram Aura, and IndexedDB Integration

  • Access the Dashboard: After signing in, you'll be redirected to your dashboard.

Building a Chrome Extension from Scratch with AI/ML API, Deepgram Aura, and IndexedDB Integration

  • Create an API Key: Go to the "Key Management" tab and click "Create API Key."

Building a Chrome Extension from Scratch with AI/ML API, Deepgram Aura, and IndexedDB Integration

  • Copy the API Key: Once generated, copy your API key.

Building a Chrome Extension from Scratch with AI/ML API, Deepgram Aura, and IndexedDB Integration

Setting the API Key in Your Extension

  • Security Note: Never hardcode your API key into your scripts if you plan to distribute your extension. Consider using environment variables or prompting the user to enter their API key.
mkdir my-first-chrome-extension
cd my-first-chrome-extension
Copy after login
Copy after login
Copy after login
Copy after login
Copy after login

Now put your API Key:

touch manifest.json
touch scripts.js
touch styles.css
Copy after login
Copy after login
Copy after login
Copy after login

But it won't work instantly. Using .env in Chrome extensions requires other extra configurations. We'll talk about this in upcoming tutorials.

  • For Testing: In your scripts.js, assign your API key to the variable handling authentication for API requests.
{
  "manifest_version": 3,
  "name": "Read Aloud",
  "version": "1.0",
  "description": "Read Aloud anything in any tab",
  "host_permissions": [
    "*://*.aimlapi.com/*"
],
  "permissions": [
      "activeTab"
  ],
  "content_scripts": [
    {
      "matches": ["<all_urls>"],
      "js": ["scripts.js"],
      "css": ["styles.css"]
    }
  ],
  "icons": {
    "16": "icons/icon.png",
    "48": "icons/icon.png",
    "128": "icons/icon.png"
  }
}
Copy after login
Copy after login
Copy after login
Copy after login

Running and Testing the Extension

With all components in place, it's time to load your extension into Chrome browser and see it in action.

Loading the Extension

  • Open Extensions Page: In Chrome, navigate to chrome://extensions/.

Building a Chrome Extension from Scratch with AI/ML API, Deepgram Aura, and IndexedDB Integration

Enable Developer Mode: Toggle the "Developer mode" switch in the top right corner.

Building a Chrome Extension from Scratch with AI/ML API, Deepgram Aura, and IndexedDB Integration

  • Load Unpacked Extension: Click "Load unpacked" and select your my-first-chrome-extension folder. (p.s. in my case it's aimlapi-tutorial-one).

Building a Chrome Extension from Scratch with AI/ML API, Deepgram Aura, and IndexedDB Integration

  • Verify Installation: The extension should now appear in the list with its name and icon.

Testing Functionality

  • Navigate to a Webpage: Open a webpage with textual content, such as an article or blog post.
  • Select Text: Highlight a paragraph or sentence.
  • Interact with the Overlay: The "Loading…" overlay should appear above the selected text. Wait a couple of seconds while initiating the text-to-speech process.
  • Listen: After a brief processing period, you should hear the text read aloud by the AI voice.

Troubleshooting Tips

  • Overlay Doesn't Appear: Check if content_scripts are correctly specified in manifest.json.
  • No Audio Playback: Verify your API key is correctly set and that API requests are successful.
  • Console Errors: Use the browser's developer tools to inspect any JavaScript errors or network issues.

Project Summary

In this tutorial, we've:

  • Set Up the Development Environment: Created the necessary project structure and files for a Chrome extension.
  • Configured manifest.json: Defined essential metadata and permissions, understanding the importance of each field.
  • Developed scripts.js: Outlined the logic for handling text selection, interacting with the AI/ML API, and managing audio playback.
  • Implemented IndexedDB Integration: Learned how to use IndexedDB for storing and retrieving audio files locally.
  • Styled the Extension with styles.css: Applied CSS to enhance the user interface and improve user experience.
  • Obtained and Set Up an API Key: Acquired an API key from the AI/ML API platform and integrated it securely into our extension.
  • Loaded and Tested the Extension: Deployed the extension in Chrome and validated its functionality on live web pages.
  • Discussed Best Practices: Emphasized the importance of security, user experience, and error handling in extension development.

Next Steps

With a solid foundation, you can enhance your extension further:

  • Add Customization Options: Allow users to choose different voices or languages.
  • Improve Error Handling: Provide user-friendly messages and fallback options if the API is unavailable.
  • Optimize Performance: Implement caching strategies or optimize API requests to reduce latency.
  • Publish Your Extension: Share your creation with others by publishing it on the Chrome Web Store.

Building a Chrome Extension from Scratch with AI/ML API, Deepgram Aura, and IndexedDB Integration


Conclusion

Congratulations on building a Chrome extension that integrates advanced AI capabilities! This project showcases how combining web technologies with powerful APIs can create engaging and accessible user experiences. You're now equipped with the knowledge to develop and expand upon this extension or create entirely new ones that leverage AI/ML APIs.

Full implementation available on Github; https://github.com/TechWithAbee/Building-a-Chrome-Extension-from-Scratch-with-AI-ML-API-Deepgram-Aura-and-IndexDB-Integration


Should you have any questions or need further assistance, don't hesitate to reach out via email at abdibrokhim@gmail.com.

The above is the detailed content of Building a Chrome Extension from Scratch with AI/ML API, Deepgram Aura, and IndexedDB Integration. For more information, please follow other related articles on the PHP Chinese website!

source:dev.to
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Latest Articles by Author
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template
About us Disclaimer Sitemap
php.cn:Public welfare online PHP training,Help PHP learners grow quickly!