Home > Backend Development > Python Tutorial > Getting Started with Vector Search (Part 2)

Getting Started with Vector Search (Part 2)

Linda Hamilton
Release: 2024-11-10 02:07:02
Original
654 people have browsed it

Getting Started with Vector Search (Part 2)

In Part 1, we set up PostgreSQL with pgvector. Now, let's see how vector search actually works.

Contents

  • What are Embeddings?
  • Loading Sample Data
  • Exploring Vector Search
  • Understanding PostgreSQL Operators
  • Next Steps

What are Embeddings?

An embedding is like a smart summary of content in numbers. The distance between two embeddings indicates their level of similarity. A small distance suggests that the vectors are quite similar, and a large distance indicates that they are less related.

? Book A: Web Development  (Distance: 0.2) ⬅️ Very Similar!
? Book B: JavaScript 101   (Distance: 0.3) ⬅️ Similar!
? Book C: Cooking Recipes  (Distance: 0.9) ❌ Not Similar
Copy after login

Loading Sample Data

Now, let's populate our database with some data. We'll use:

  • Open Library API for book data
  • OpenAI API to create embeddings
  • pgvector to store and search them

Project Structure

pgvector-setup/             # From Part 1
  ├── compose.yml
  ├── postgres/
  │   └── schema.sql
  ├── .env                  # New: for API keys
  └── scripts/              # New: for data loading
      ├── requirements.txt
      ├── Dockerfile
      └── load_data.py
Copy after login

Create a Script

Let's start with a script to load data from external APIs. The full script is Here.

Setting Up Data Loading

  1. Create .env:
OPENAI_API_KEY=your_openai_api_key
Copy after login
  1. Update compose.yml to add the data loader:
services:
  # ... existing db service from Part 1

  data_loader:
    build:
      context: ./scripts
    environment:
      - DATABASE_URL=postgresql://postgres:password@db:5432/example_db
      - OPENAI_API_KEY=${OPENAI_API_KEY}
    depends_on:
      - db
Copy after login
  1. Load the data:
docker compose up data_loader
Copy after login

You should see 10 programming books with their metadata.

Exploring Vector Search

Connect to your database:

docker exec -it pgvector-db psql -U postgres -d example_db
Copy after login

Understanding Vector Data

Let's peek at what embeddings actually look like:

-- View first 5 dimensions of an embedding
SELECT
    name,
    (embedding::text::float[])[1:5] as first_5_dimensions
FROM items
LIMIT 1;
Copy after login
  • Each embedding has 1536 dimensions (using OpenAI's model)
  • Values typically range from -1 to 1
  • These numbers represent semantic meaning

Finding Similar Books

Try a simple similarity search:

-- Find 3 books similar to any book about Web
SELECT name, metadata
FROM items
ORDER BY embedding <-> (
    SELECT embedding
    FROM items
    WHERE metadata->>'title' LIKE '%Web%'
    LIMIT 1
)
LIMIT 3;
Copy after login
  1. Find a book with "Web" in its title
  2. Get that book's embedding (its mathematical representation)
  3. Compare this embedding with all other books' embeddings
  4. Get the 3 most similar books (smallest distances)

Understanding PostgreSQL Operators

Let's break down the operators used in vector search queries:

JSON Text Operator: ->>

Extracts text value from a JSON field.

Example:

-- If metadata = {"title": "ABC"}, it returns "ABC"
SELECT metadata->>'title' FROM items;
Copy after login

Vector Distance Operator: <->

Measures similarity between two vectors.

  • Smaller distance = More similar
  • Larger distance = Less similar

Example:

-- Find similar books
SELECT name, embedding <-> query_embedding as distance
FROM items
ORDER BY distance
LIMIT 3;
Copy after login

Next Steps

Up next, we'll:

  • Build a FastAPI application
  • Create search endpoints
  • Make our vector search accessible via API

Stay tuned for Part 3: "Building a Vector Search API"! ?

Feel free to drop a comment below! ?

The above is the detailed content of Getting Started with Vector Search (Part 2). For more information, please follow other related articles on the PHP Chinese website!

source:dev.to
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Latest Articles by Author
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template