Evaluating Medical Retrieval-Augmented Generation (RAG) with NVIDIA AI Endpoints and Ragas-JS Tutorial-php.cn

Home

Web Front-end

JS Tutorial

Evaluating Medical Retrieval-Augmented Generation (RAG) with NVIDIA AI Endpoints and Ragas

Patricia Arquette

Nov 09, 2024 am 03:03 AM

Evaluating Medical Retrieval-Augmented Generation (RAG) with NVIDIA AI Endpoints and Ragas

In the realm of medicine, incorporating advanced technologies is essential to enhance patient care and improve research methodologies. Retrieval-augmented generation (RAG) is one of these pioneering innovations, blending the power of large language models (LLMs) with external knowledge retrieval. By pulling relevant information from databases, scientific literature, and patient records, RAG systems provide a more accurate and contextually enriched response foundation, addressing limitations like outdated information and hallucinations often observed in pure LLMs.

In this overview, we’ll explore RAG’s growing role in healthcare, focusing on its potential to transform applications like drug discovery and clinical trials. We'll also dive into the methods and tools necessary to evaluate the unique demands of medical RAG systems, such as NVIDIA’s LangChain endpoints and the Ragas framework, along with the MACCROBAT dataset, a collection of patient reports from PubMed Central.

Key Challenges of Medical RAG

Scalability: With medical data expanding at over 35% CAGR, RAG systems need to manage and retrieve information efficiently without compromising speed, especially in scenarios where timely insights can impact patient care.
Specialized Language and Knowledge Requirements: Medical RAG systems require domain-specific tuning since the medical lexicon and content differ substantially from other domains like finance or law.
Absence of Tailored Evaluation Metrics: Unlike general-purpose RAG applications, medical RAG lacks well-suited benchmarks. Conventional metrics (like BLEU or ROUGE) emphasize text similarity rather than the factual accuracy critical in medical contexts.
Component-wise Evaluation: Effective evaluation requires independent scrutiny of both the retrieval and generation components. Retrieval must pull relevant, current data, and the generation component must ensure faithfulness to retrieved content.

Introducing Ragas for RAG Evaluation

Ragas, an open-source evaluation framework, offers an automated approach for assessing RAG pipelines. Its toolkit focuses on context relevancy, recall, faithfulness, and answer relevancy. Utilizing an LLM-as-a-judge model, Ragas minimizes the need for manually annotated data, making the process efficient and cost-effective.

Evaluation Strategies for RAG Systems

For robust RAG evaluation, consider these steps:

Synthetic Data Generation: Generate triplet data (question, answer, context) based on the vector store documents to create synthetic test data.
Metric-Based Evaluation: Evaluate the RAG system on metrics like precision and recall, comparing its responses to the generated synthetic data as ground truth.
Independent Component Evaluation: For each question, assess retrieval context relevance and the generation’s answer accuracy.

Here’s an example pipeline: given a question like “What are typical BP measurements in congestive heart failure?” the system first retrieves relevant context and then evaluates if the response addresses the question accurately.

Setting Up RAG with NVIDIA API and LangChain

To follow along, create an NVIDIA account and obtain an API key. Install the necessary packages with:

pip install langchain
pip install langchain_nvidia_ai_endpoints
pip install ragas

Copy after login

Download the MACCROBAT dataset, which offers comprehensive medical records that can be loaded and processed via LangChain.

from langchain_community.document_loaders import HuggingFaceDatasetLoader
from datasets import load_dataset

dataset_name = "singh-aditya/MACCROBAT_biomedical_ner"
page_content_column = "full_text"

loader = HuggingFaceDatasetLoader(dataset_name, page_content_column)
dataset = loader.load()

Copy after login

Using NVIDIA endpoints and LangChain, we can now build a robust test set generator and create synthetic data based on the dataset:

from ragas.testset.generator import TestsetGenerator
from langchain_nvidia_ai_endpoints import ChatNVIDIA, NVIDIAEmbeddings

critic_llm = ChatNVIDIA(model="meta/llama3.1-8b-instruct")
generator_llm = ChatNVIDIA(model="mistralai/mixtral-8x7b-instruct-v0.1")
embeddings = NVIDIAEmbeddings(model="nv-embedqa-e5-v5", truncate="END")

generator = TestsetGenerator.from_langchain(
    generator_llm, critic_llm, embeddings, chunk_size=512
)
testset = generator.generate_with_langchain_docs(dataset, test_size=10)

Copy after login

Deploying and Evaluating the Pipeline

Deploy your RAG system on a vector store, generating sample questions from actual medical reports:

# Sample questions
["What are typical BP measurements in the case of congestive heart failure?",
 "What can scans reveal in patients with severe acute pain?",
 "Is surgical intervention necessary for liver metastasis?"]

Copy after login

Each question links with a retrieved context and a generated ground truth answer, which can then be used to evaluate the performance of both retrieval and generation components.

Custom Metrics with Ragas

Medical RAG systems may need custom metrics to assess retrieval precision. For instance, a metric could determine if a retrieved document is relevant enough for a search query:

from dataclasses import dataclass, field
from ragas.evaluation.metrics import MetricWithLLM, Prompt

RETRIEVAL_PRECISION = Prompt(
    name="retrieval_precision",
    instruction="Is this result relevant enough for the first page of search results? Answer '1' for yes and '0' for no.",
    input_keys=["question", "context"]
)

@dataclass
class RetrievalPrecision(MetricWithLLM):
    name: str = "retrieval_precision"
    evaluation_mode = EvaluationMode.qc
    context_relevancy_prompt: Prompt = field(default_factory=lambda: RETRIEVAL_PRECISION)

# Use this custom metric in evaluation
score = evaluate(dataset["eval"], metrics=[RetrievalPrecision()])

Copy after login

Structured Output for Precision and Reliability

For an efficient and reliable evaluation, structured output simplifies processing. With NVIDIA's LangChain endpoints, structure your LLM response into predefined categories (e.g., yes/no).

import enum

class Choices(enum.Enum):
    Y = "Y"
    N = "N"

structured_llm = nvidia_llm.with_structured_output(Choices)
structured_llm.invoke("Is this search result relevant to the query?")

Copy after login

Conclusion

RAG bridges LLMs and dense vector retrieval for highly efficient, scalable applications across medical, multilingual, and code generation domains. In healthcare, its potential to bring accurate, contextually aware responses is evident, but evaluation must prioritize accuracy, domain specificity, and cost-efficiency.

The outlined evaluation pipeline, employing synthetic test data, NVIDIA endpoints, and Ragas, offers a robust method to meet these demands. For a deeper dive, you can explore Ragas and NVIDIA Generative AI examples on GitHub.

The above is the detailed content of Evaluating Medical Retrieval-Augmented Generation (RAG) with NVIDIA AI Endpoints and Ragas. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

How to fix KB5055523 fails to install in Windows 11?

4 weeks ago By DDD

How to fix KB5055518 fails to install in Windows 10?

4 weeks ago By DDD

Roblox: Grow A Garden - Complete Mutation Guide

3 weeks ago By DDD

Roblox: Bubble Gum Simulator Infinity - How To Get And Use Royal Keys

3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

How to fix KB5055612 fails to install in Windows 10?

3 weeks ago By DDD

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Java Tutorial

1664

CakePHP Tutorial

1421

Laravel Tutorial

1316

PHP Tutorial

1266

C# Tutorial

1239

Related knowledge

Demystifying JavaScript: What It Does and Why It Matters Apr 09, 2025 am 12:07 AM

JavaScript is the cornerstone of modern web development, and its main functions include event-driven programming, dynamic content generation and asynchronous programming. 1) Event-driven programming allows web pages to change dynamically according to user operations. 2) Dynamic content generation allows page content to be adjusted according to conditions. 3) Asynchronous programming ensures that the user interface is not blocked. JavaScript is widely used in web interaction, single-page application and server-side development, greatly improving the flexibility of user experience and cross-platform development.

The Evolution of JavaScript: Current Trends and Future Prospects Apr 10, 2025 am 09:33 AM

The latest trends in JavaScript include the rise of TypeScript, the popularity of modern frameworks and libraries, and the application of WebAssembly. Future prospects cover more powerful type systems, the development of server-side JavaScript, the expansion of artificial intelligence and machine learning, and the potential of IoT and edge computing.

JavaScript Engines: Comparing Implementations Apr 13, 2025 am 12:05 AM

Different JavaScript engines have different effects when parsing and executing JavaScript code, because the implementation principles and optimization strategies of each engine differ. 1. Lexical analysis: convert source code into lexical unit. 2. Grammar analysis: Generate an abstract syntax tree. 3. Optimization and compilation: Generate machine code through the JIT compiler. 4. Execute: Run the machine code. V8 engine optimizes through instant compilation and hidden class, SpiderMonkey uses a type inference system, resulting in different performance performance on the same code.

Python vs. JavaScript: The Learning Curve and Ease of Use Apr 16, 2025 am 12:12 AM

Python is more suitable for beginners, with a smooth learning curve and concise syntax; JavaScript is suitable for front-end development, with a steep learning curve and flexible syntax. 1. Python syntax is intuitive and suitable for data science and back-end development. 2. JavaScript is flexible and widely used in front-end and server-side programming.

JavaScript: Exploring the Versatility of a Web Language Apr 11, 2025 am 12:01 AM

JavaScript is the core language of modern web development and is widely used for its diversity and flexibility. 1) Front-end development: build dynamic web pages and single-page applications through DOM operations and modern frameworks (such as React, Vue.js, Angular). 2) Server-side development: Node.js uses a non-blocking I/O model to handle high concurrency and real-time applications. 3) Mobile and desktop application development: cross-platform development is realized through ReactNative and Electron to improve development efficiency.

How to Build a Multi-Tenant SaaS Application with Next.js (Frontend Integration) Apr 11, 2025 am 08:22 AM

This article demonstrates frontend integration with a backend secured by Permit, building a functional EdTech SaaS application using Next.js. The frontend fetches user permissions to control UI visibility and ensures API requests adhere to role-base

Building a Multi-Tenant SaaS Application with Next.js (Backend Integration) Apr 11, 2025 am 08:23 AM

I built a functional multi-tenant SaaS application (an EdTech app) with your everyday tech tool and you can do the same. First, what’s a multi-tenant SaaS application? Multi-tenant SaaS applications let you serve multiple customers from a sing

From C/C to JavaScript: How It All Works Apr 14, 2025 am 12:05 AM

The shift from C/C to JavaScript requires adapting to dynamic typing, garbage collection and asynchronous programming. 1) C/C is a statically typed language that requires manual memory management, while JavaScript is dynamically typed and garbage collection is automatically processed. 2) C/C needs to be compiled into machine code, while JavaScript is an interpreted language. 3) JavaScript introduces concepts such as closures, prototype chains and Promise, which enhances flexibility and asynchronous programming capabilities.

See all articles