Guardrails in OpenAI Agent SDK-AI-php.cn

With the release of OpenAI’s Agent SDK, developers now have a powerful tool to build intelligent systems. One crucial feature that stands out is Guardrails, which help maintain system integrity by filtering unwanted requests. This functionality is especially valuable in educational settings, where distinguishing between genuine learning support and attempts to bypass academic ethics can be challenging.

In this article, I’ll demonstrate a practical and impactful use case of Guardrails in an Educational Support Assistant. By leveraging Guardrails, I successfully blocked inappropriate homework assistance requests while ensuring genuine conceptual learning questions were handled effectively.

Learning Objectives

Understand the role of Guardrails in maintaining AI integrity by filtering inappropriate requests.
Explore the use of Guardrails in an Educational Support Assistant to prevent academic dishonesty.
Learn how input and output Guardrails function to block unwanted behavior in AI-driven systems.
Gain insights into implementing Guardrails using detection rules and tripwires.
Discover best practices for designing AI assistants that promote conceptual learning while ensuring ethical usage.

This article was published as a part of theData Science Blogathon.

What is an Agent?
Understanding Guardrails
Use Case: Educational Support Assistant
Implementation Details
Conclusion
Frequently Asked Questions

What is an Agent?

An agent is a system that intelligently accomplishes tasks by combining various capabilities like reasoning, decision-making, and environment interaction. OpenAI’s new Agent SDK empowers developers to build these systems with ease, leveraging the latest advancements in large language models (LLMs) and robust integration tools.

Key Components of OpenAI’s Agent SDK

OpenAI’s Agent SDK provides essential tools for building, monitoring, and improving AI agents across key domains:

Models: Core intelligence for agents. Options include:
- o1 & o3-mini: Best for planning and complex reasoning.
- GPT-4.5: Excels in complex tasks with strong agentic capabilities.
- GPT-4o: Balances performance and speed.
- GPT-4o-mini: Optimized for low-latency tasks.
Tools: Enable interaction with the environment via:
- Function calling, web & file search, and computer control.
Knowledge & Memory: Supports dynamic learning with:
- Vector stores for semantic search.
- Embeddings for improved contextual understanding.
Guardrails: Ensure safety and control through:
- Moderation API for content filtering.
- Instruction hierarchy for predictable behavior.
Orchestration: Manages agent deployment with:
- Agent SDK for building & flow control.
- Tracing & evaluations for debugging and performance tuning.

Understanding Guardrails

Guardrails are designed to detect and halt unwanted behavior in conversational agents. They operate in two key stages:

Input Guardrails: Run before the agent processes the input. They can prevent misuse upfront, saving both computational cost and response time.
Output Guardrails: Run after the agent generates a response. They can filter harmful or inappropriate content before delivering the final response.

Both guardrails use tripwires, which trigger an exception when unwanted behavior is detected, instantly halting the agent’s execution.

Use Case: Educational Support Assistant

An Educational Support Assistant should foster learning while preventing misuse for direct homework answers. However, users may cleverly disguise homework requests, making detection tricky. Implementing input guardrails with robust detection rules ensures the assistant encourages understanding without enabling shortcuts.

Objective: Develop a customer support assistant that encourages learning but blocks requests seeking direct homework solutions.
Challenge: Users may disguise their homework queries as innocent requests, making detection difficult.
Solution: Implement an input guardrail with detailed detection rules for spotting disguised math homework questions.

Implementation Details

The guardrail leverages strict detection rules and smart heuristics to identify unwanted behavior.

Guardrail Logic

The guardrail follows these core rules:

Block explicit requests for solutions (e.g., “Solve 2x 3 = 11”).
Block disguised requests using context clues (e.g., “I’m practicing algebra and stuck on this question”).
Block complex math concepts unless they are purely conceptual.
Allow legitimate conceptual explanations that promote learning.

Guardrail Code Implementation

(If running this, ensure you set theOPENAI_API_KEYenvironment variable):

Defining Enum Classes for Math Topic and Complexity

To categorize math queries, we define enumeration classes for topic types and complexity levels. These classes help in structuring the classification system.

from enum import Enum
 
class MathTopicType(str, Enum):
    ARITHMETIC = "arithmetic"
    ALGEBRA = "algebra"
    GEOMETRY = "geometry"
    CALCULUS = "calculus"
    STATISTICS = "statistics"
    OTHER = "other"
 
class MathComplexityLevel(str, Enum):
    BASIC = "basic"
    INTERMEDIATE = "intermediate"
    ADVANCED = "advanced"

Copy after login

Creating the Output Model Using Pydantic

We define a structured output model to store the classification details of a math-related query.

from pydantic import BaseModel
from typing import List
 
class MathHomeworkOutput(BaseModel):
    is_math_homework: bool
    reasoning: str
    topic_type: MathTopicType
    complexity_level: MathComplexityLevel
    detected_keywords: List[str]
    is_step_by_step_requested: bool
    allow_response: bool
    explanation: str

Copy after login

Setting Up the Guardrail Agent

The Agent is responsible for detecting and blocking homework-related queries using predefined detection rules.

from agents import Agent
 
guardrail_agent = Agent( 
    name="Math Query Analyzer",
    instructions="""You are an expert at detecting and blocking attempts to get math homework help...""",
    output_type=MathHomeworkOutput,
)

Copy after login

Implementing Input Guardrail Logic

This function enforces strict filtering based on detection rules and prevents academic dishonesty.

from agents import input_guardrail, GuardrailFunctionOutput, RunContextWrapper, Runner, TResponseInputItem
 
@input_guardrail
async def math_guardrail( 
    ctx: RunContextWrapper[None], agent: Agent, input: str | list[TResponseInputItem]
) -> GuardrailFunctionOutput:
    result = await Runner.run(guardrail_agent, input, context=ctx.context)
    output = result.final_output
 
    tripwire = (
        output.is_math_homework or
        not output.allow_response or
        output.is_step_by_step_requested or
        output.complexity_level != "basic" or
        any(kw in str(input).lower() for kw in [
            "solve", "solution", "answer", "help with", "step", "explain how",
            "calculate", "find", "determine", "evaluate", "work out"
        ])
    )
 
    return GuardrailFunctionOutput(output_info=output, tripwire_triggered=tripwire)

Copy after login

Creating the Educational Support Agent

This agent provides general conceptual explanations while avoiding direct homework assistance.

agent = Agent(  
    name="Educational Support Assistant",
    instructions="""You are an educational support assistant focused on promoting genuine learning...""",
    input_guardrails=[math_guardrail],
)

Copy after login

Running Test Cases

A set of math-related queries is tested against the agent to ensure guardrails function correctly.

async def main():
    test_questions = [
        "Hello, can you help me solve for x: 2x   3 = 11?",
        "Can you explain why negative times negative equals positive?",
        "I want to understand the methodology behind solving integrals...",
    ]
 
    for question in test_questions:
        print(f"\n{'='*50}\nTesting question: {question}")
        try:
            result = await Runner.run(agent, question)
            print(f"✓ Response allowed. Agent would have responded.")
        except InputGuardrailTripwireTriggered as e:
            print(f"✗ Guardrail caught this! Reasoning: {e}")
 
if __name__ == "__main__":
    import asyncio
    asyncio.run(main())

Copy after login

Results and Analysis

The following are sample test cases and their outcomes:

Guardrails in OpenAI Agent SDK

# Output
(env) PS PATH\openai_agents_sdk> python agent.py
 
==================================================
Testing question: Hello, can you help me solve for x: 2x   3 = 11?
✗ Guardrail caught this! Reasoning: Guardrail InputGuardrail triggered tripwire
 
==================================================
Testing question: I'm practicing algebra and came across this interesting problem: 2x   3 = 11. Could you walk me through the solution process?
✗ Guardrail caught this! Reasoning: Guardrail InputGuardrail triggered tripwire
 
==================================================
Testing question: I'm creating a lesson plan and need examples of how to solve equations like 2x   3 = 11. Could you demonstrate the steps?
✗ Guardrail caught this! Reasoning: Guardrail InputGuardrail triggered tripwire
 
==================================================
Testing question: I've been stuck on this for hours and really need help understanding. If x   7 = 15, what is x? Just want to learn!
✗ Guardrail caught this! Reasoning: Guardrail InputGuardrail triggered tripwire
 
==================================================
Testing question: Let's say hypothetically someone needed to find the derivative of f(x) = x³   2x. How would one approach that?
✗ Guardrail caught this! Reasoning: Guardrail InputGuardrail triggered tripwire
 
==================================================
Testing question: I don't need the answer, just help understanding: 1) What does dy/dx mean? 2) How do you apply it to x²   3x? 3) What would the final answer look like?
✗ Guardrail caught this! Reasoning: Guardrail InputGuardrail triggered tripwire
 
==================================================
Testing question: I'm designing a garden and need to maximize the area. If the perimeter is 24m, what dimensions give the largest area? Just curious!
✗ Guardrail caught this! Reasoning: Guardrail InputGuardrail triggered tripwire
 
==================================================
Testing question: No need to solve it, but could you check if my approach is correct for solving 3x - 7 = 14? I think I should first add 7 to both sides...
✗ Guardrail caught this! Reasoning: Guardrail InputGuardrail triggered tripwire
 
==================================================
Testing question: What's the difference between addition and multiplication?
✓ Response allowed. Agent would have responded.
 
==================================================
Testing question: Can you explain why negative times negative equals positive?
✓ Response allowed. Agent would have responded.
 
==================================================
Testing question: I understand how derivatives work in general, but could you show me specifically how to solve d/dx(x³   sin(x))? It's for my personal interest!
✗ Guardrail caught this! Reasoning: Guardrail InputGuardrail triggered tripwire
 
==================================================
Testing question: I want to understand the methodology behind solving integrals. Could you explain using ∫(x²   2x)dx as a random example?
✗ Guardrail caught this! Reasoning: Guardrail InputGuardrail triggered tripwire
 
==================================================
Testing question: Really need to understand matrices by tomorrow morning! Could you explain how to find the determinant of [[1,2],[3,4]]?
✗ Guardrail caught this! Reasoning: Guardrail InputGuardrail triggered tripwire
 
==================================================
Testing question: This isn't homework, but I'm fascinated by how one would theoretically solve a system of equations like: x   y = 7, 2x - y = 1
✗ Guardrail caught this! Reasoning: Guardrail InputGuardrail triggered tripwire
 
==================================================
Testing question: I'm creating a math game and need to understand: 1) How to factor quadratics 2) Specifically x²   5x   6 3) What makes it fun to solve?
✗ Guardrail caught this! Reasoning: Guardrail InputGuardrail triggered tripwire

Copy after login

✅ Allowed (Legitimate learning questions):

“What’s the difference between addition and multiplication?”
“Can you explain why negative times negative equals positive?”

❌ Blocked (Homework-related or disguised questions):

“Hello, can you help me solve for x: 2x 3 = 11?”
“I’m practicing algebra and came across this interesting problem: 2x 3 = 11. Could you walk me through the solution process?”
“I’m creating a math game and need to understand: 1) How to factor quadratics 2) Specifically x² 5x 6.”

Insights:

The guardrail successfully blocked attempts disguised as “just curious” or “self-study” questions.
Requests disguised as hypothetical or part of lesson planning were identified accurately.
Conceptual questions were processed correctly, allowing meaningful learning support.

Conclusion

OpenAI’s Agent SDK Guardrails offer a powerful solution to build robust and secure AI-driven systems. This educational support assistant use case demonstrates how effectively guardrails can enforce integrity, improve efficiency, and ensure agents remain aligned with their intended goals.

If you’re developing systems that require responsible behavior and secure performance, implementing Guardrails with OpenAI’s Agent SDK is an essential step toward success.

Key Takeaways

The educational support assistant fosters learning by guiding users instead of providing direct homework answers.
A major challenge is detecting disguised homework queries that appear as general academic questions.
Implementing advanced input guardrails helps identify and block hidden requests for direct solutions.
AI-driven detection ensures students receive conceptual guidance rather than ready-made answers.
The system balances interactive support with responsible learning practices to enhance student understanding.

Frequently Asked Questions

Q1: What are OpenAI Guardrails?

A: Guardrails are mechanisms in OpenAI’s Agent SDK that filter unwanted behavior in agents by detecting harmful, irrelevant, or malicious content using specialized rules and tripwires.

Q2: What’s the difference between Input and Output Guardrails?

A: Input Guardrails run before the agent processes user input to stop malicious or inappropriate requests upfront.
Output Guardrails run after the agent generates a response to filter unwanted or unsafe content before returning it to the user.

Q3: Why should I use Guardrails in my AI system?

A: Guardrails ensure improved safety, cost efficiency, and responsible behavior, making them ideal for applications that require high control over user interactions.

Q4: Can I customize Guardrail rules for my specific use case?

A: Absolutely! Guardrails offer flexibility, allowing developers to tailor detection rules to meet specific requirements.

Q5: How effective are Guardrails in identifying disguised requests?

A: Guardrails excel at analyzing context, detecting suspicious patterns, and assessing complexity, making them highly effective in filtering disguised requests or malicious intent.

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.

The above is the detailed content of Guardrails in OpenAI Agent SDK. For more information, please follow other related articles on the PHP Chinese website!