堅牢な LLM アプリケーションを構築するための基本的なプラクティス-Python チュートリアル-php.cn

ホームページ

バックエンド開発

Python チュートリアル

堅牢な LLM アプリケーションを構築するための基本的なプラクティス

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Jul 28, 2024 am 11:22 AM

Essential Practices for Building Robust LLM Applications

導入

私はクラウドで LLM アプリケーションを構築してきました。また、多くの開発者が LLM アプリを作成しているのを見てきましたが、これは MVP やプロトタイプとしては非常に優れており、本番環境に対応するには多少の作業が必要です。リストされている 1 つ以上のプラクティスを適用すると、アプリケーションを効果的に拡張するのに役立ちます。この記事では、アプリケーション開発のソフトウェアエンジニアリングの側面全体を説明するのではなく、LLM ラッパーアプリケーションについてのみ説明します。また、コードスニペットは Python で記述されており、同じロジックを他の言語にも適用できます。

1. 柔軟性のためにミドルウェアを活用する

LiteLLM や LangChain などのミドルウェアを使用して、ベンダーロックインを回避し、進化に合わせてモデルを簡単に切り替えます。

Python:

from litellm import completion

response = completion(
    model="gpt-3.5-turbo", 
    messages=[{"role": "user", "content": "Hello, how are you?"}]
)

ログイン後にコピー

LiteLLM や LangChain などのミドルウェアソリューションは、アプリケーションとさまざまな LLM プロバイダーの間に抽象化レイヤーを提供します。この抽象化により、コアアプリケーションコードを変更せずに、異なるモデルまたはプロバイダーを簡単に切り替えることができます。 AI の状況が急速に進化するにつれて、機能が向上した新しいモデルが頻繁にリリースされます。ミドルウェアを使用すると、これらの新しいモデルを迅速に採用したり、パフォーマンス、コスト、機能要件に基づいてプロバイダーを切り替えたりできるため、アプリケーションを最新の状態に保ち、競争力を維持できます。

2. 再試行メカニズムを実装する

API 呼び出しに再試行ロジックを実装することで、レート制限の問題を回避します。

Python:

import time
from openai import OpenAI

client = OpenAI()

def retry_api_call(max_retries=3, delay=1):
    for attempt in range(max_retries):
        try:
            response = client.chat.completions.create(
                model="gpt-3.5-turbo",
                messages=[{"role": "user", "content": "Hello!"}]
            )
            return response
        except Exception as e:
            if attempt == max_retries - 1:
                raise e
            time.sleep(delay * (2 ** attempt))  # Exponential backoff

ログイン後にコピー

LLM プロバイダーは、悪用を防止し、公正な使用を保証するためにレート制限を課すことがよくあります。指数バックオフを使用した再試行メカニズムを実装すると、アプリケーションが一時的な障害やレート制限エラーを適切に処理できるようになります。このアプローチにより、失敗したリクエストが自動的に再試行され、一時的な問題によるサービス中断の可能性が減り、アプリケーションの信頼性が向上します。指数関数的バックオフ戦略 (再試行間の遅延を増やす) は、レート制限の問題を悪化させる可能性がある、即時の再リクエストによる API の過負荷を防ぐのに役立ちます。

3. LLM プロバイダーのフォールバックを設定する

単一の LLM プロバイダーに依存しないでください。クォータの問題やサービスの中断に対処するためにフォールバックを実装します。

from litellm import completion

def get_llm_response(prompt):
    providers = ['openai/gpt-3.5-turbo', 'anthropic/claude-2', 'cohere/command-nightly']
    for provider in providers:
        try:
            response = completion(model=provider, messages=[{"role": "user", "content": prompt}])
            return response
        except Exception as e:
            print(f"Error with {provider}: {str(e)}")
            continue
    raise Exception("All LLM providers failed")

ログイン後にコピー

単一の LLM プロバイダーに依存すると、そのプロバイダーでダウンタイムが発生したり、クォータ制限に達したりした場合にサービスの中断が発生する可能性があります。フォールバックオプションを実装すると、アプリケーションの継続的な動作が保証されます。このアプローチでは、さまざまなタスクに対してさまざまなプロバイダーやモデルの強みを活用することもできます。 LiteLLM は、複数のプロバイダーに統合インターフェイスを提供することでこのプロセスを簡素化し、プロバイダー間の切り替えやフォールバックロジックの実装を容易にします。

4. 可観測性の実装

LLM のトレースとデバッグには、Langfuse や Helicone などのツールを使用します。

from langfuse.openai import OpenAI

client = OpenAI(
    api_key="your-openai-api-key",
    langfuse_public_key="your-langfuse-public-key",
    langfuse_secret_key="your-langfuse-secret-key"
)

response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[{"role": "user", "content": "Hello, AI!"}]
)

ログイン後にコピー

可観測性を実装する利点:

強化されたデバッグ: 会話を簡単に追跡および再生して問題を特定します。
パフォーマンスの最適化: 応答時間とモデルのパフォーマンスについての洞察を取得します。
コスト管理: トークンの使用状況と関連コストを追跡して、予算管理を改善します。
品質保証: 回答の品質を監視し、改善の余地がある領域を特定します。
ユーザーエクスペリエンス分析: ユーザーインタラクションを理解し、それに応じてプロンプトを最適化します。
コンプライアンスと監査: 規制遵守と内部監査のためのログを維持します。
異常検出: 異常なパターンや動作を迅速に特定し、対応します。

可観測性ツールは、LLM アプリケーションのパフォーマンス、使用パターン、潜在的な問題についての重要な洞察を提供します。これにより、LLM とのやり取りをリアルタイムで監視および分析できるため、プロンプトの最適化、ボトルネックの特定、AI 生成の応答の品質の確保に役立ちます。このレベルの可視性は、アプリケーションを長期にわたって保守、デバッグ、改善するために不可欠です。

5. プロンプトを効果的に管理する

コードまたはテキストファイルにプロンプトをハードコーディングする代わりに、バージョン管理を備えたプロンプト管理ツールを使用します。

from promptflow import PromptFlow

pf = PromptFlow()

prompt_template = pf.get_prompt("greeting_prompt", version="1.2")
response = openai.ChatCompletion.create(
    model="gpt-3.5-turbo",
    messages=[{"role": "user", "content": prompt_template.format(name="Alice")}]
)

ログイン後にコピー

LLM アプリケーションを維持および改善するには、効果的なプロンプト管理が不可欠です。専用のプロンプト管理ツールを使用すると、プロンプトのバージョン管理、さまざまなバリエーションの A/B テスト、およびアプリケーション全体でのプロンプトの更新を簡単に行うことができます。このアプローチにより、プロンプトロジックがアプリケーションコードから分離され、コアアプリケーションを変更せずにプロンプトの反復処理が容易になります。また、技術者以外のチームメンバーも迅速な改善に貢献できるようになり、AI インタラクションを改良する際のより良いコラボレーションが可能になります。

6. Store Conversation History Persistently

Use a persistent cache like Redis for storing conversation history instead of in-memory cache which is not adapted for distributed systems.

from langchain.memory import RedisChatMessageHistory
from langchain.chains import ConversationChain
from langchain.llms import OpenAI

# Initialize Redis chat message history
message_history = RedisChatMessageHistory(url="redis://localhost:6379/0", ttl=600, session_id="user-123")

# Create a conversation chain with Redis memory
conversation = ConversationChain(
    llm=OpenAI(),
    memory=message_history,
    verbose=True
)

# Use the conversation
response = conversation.predict(input="Hi there!")
print(response)

# The conversation history is automatically stored in Redis

ログイン後にコピー

Storing conversation history is essential for maintaining context in ongoing interactions and providing personalized experiences. Using a persistent cache like Redis, especially in distributed systems, ensures that conversation history is reliably stored and quickly accessible. This approach allows your application to scale horizontally while maintaining consistent user experiences across different instances or servers. The use of Redis with LangChain simplifies the integration of persistent memory into your conversational AI system, making it easier to build stateful, context-aware applications.

7. Use JSON Mode whenever possible

Whenever possible like extracting structured information, provide a JSON schema instead of relying on raw text output.

import openai

response = openai.ChatCompletion.create(
    model="gpt-3.5-turbo-1106",
    response_format={"type": "json_object"},
    messages=[
        {"role": "system", "content": "Extract the name and age from the user's input."},
        {"role": "user", "content": "My name is John and I'm 30 years old."}
    ]
)

print(response.choices[0].message.content)
# Output: {"name": "John", "age": 30}

ログイン後にコピー

Using JSON mode for information extraction provides a structured and consistent output format, making it easier to parse and process the LLM's responses in your application. This approach reduces the need for complex post-processing of free-form text and minimizes the risk of misinterpretation. It's particularly useful for tasks like form filling, data extraction from unstructured text, or any scenario where you need to integrate AI-generated content into existing data structures or databases.

8. Set Up Credit Alerts

Implement alerts for prepaid credits and per-user credit checks, even in MVP stages.

def check_user_credits(user_id, requested_tokens):
    user_credits = get_user_credits(user_id)
    if user_credits < requested_tokens:
        raise InsufficientCreditsError(f"User {user_id} has insufficient credits")

    remaining_credits = user_credits - requested_tokens
    if remaining_credits < CREDIT_ALERT_THRESHOLD:
        send_low_credit_alert(user_id, remaining_credits)

    return True

ログイン後にコピー

Implementing credit alerts and per-user credit checks is crucial for managing costs and ensuring fair usage in your LLM application. This system helps prevent unexpected expenses and allows you to proactively manage user access based on their credit limits. By setting up alerts at multiple thresholds, you can inform users or administrators before credits are depleted, ensuring uninterrupted service. This approach is valuable even in MVP stages, as it helps you understand usage patterns and plan for scaling your application effectively.

9. Implement Feedback Loops

Create mechanisms for users to provide feedback on AI responses, starting with simple thumbs up/down ratings.

def process_user_feedback(response_id, feedback):
    if feedback == 'thumbs_up':
        log_positive_feedback(response_id)
    elif feedback == 'thumbs_down':
        log_negative_feedback(response_id)
        trigger_improvement_workflow(response_id)

# In your API endpoint
@app.route('/feedback', methods=['POST'])
def submit_feedback():
    data = request.json
    process_user_feedback(data['response_id'], data['feedback'])
    return jsonify({"status": "Feedback received"})

ログイン後にコピー

Implementing feedback loops is essential for continuously improving your LLM application. By allowing users to provide feedback on AI responses, you can identify areas where the model performs well and where it needs improvement. This data can be used to fine-tune models, adjust prompts, or implement additional safeguards. Starting with simple thumbs up/down ratings provides an easy way for users to give feedback, while more detailed feedback options can be added later for deeper insights. This approach helps in building trust with users and demonstrates your commitment to improving the AI's performance based on real-world usage.

10. Implement Guardrails

Use prompt guards to check for prompt injection attacks, toxic content, and off-topic responses.

import re
from better_profanity import profanity

def check_prompt_injection(input_text):
    injection_patterns = [
        r"ignore previous instructions",
        r"disregard all prior commands",
        r"override system prompt"
    ]
    for pattern in injection_patterns:
        if re.search(pattern, input_text, re.IGNORECASE):
            return True
    return False

def check_toxic_content(input_text):
    return profanity.contains_profanity(input_text)

def sanitize_input(input_text):
    if check_prompt_injection(input_text):
        raise ValueError("Potential prompt injection detected")

    if check_toxic_content(input_text):
        raise ValueError("Toxic content detected")

    # Additional checks can be added here (e.g., off-topic detection)

    return input_text  # Return sanitized input if all checks pass

# Usage
try:
    safe_input = sanitize_input(user_input)
    # Process safe_input with your LLM
except ValueError as e:
    print(f"Input rejected: {str(e)}")

ログイン後にコピー

Implementing guardrails is crucial for ensuring the safety and reliability of your LLM application. This example demonstrates how to check for potential prompt injection attacks and toxic content. Prompt injection attacks attempt to override or bypass the system's intended behavior, while toxic content checks help maintain a safe and respectful environment. By implementing these checks, you can prevent malicious use of your AI system and ensure that the content generated aligns with your application's guidelines and ethical standards. Additional checks can be added to detect off-topic responses or other unwanted behaviors, further enhancing the robustness of your application.

Conclusion

All the above listed points can be easily integrated into your application and they prepare you better for scaling in production. You may also agree or disagree on some of the above points. In any case, feel free to post your questions or comments.

以上が堅牢な LLM アプリケーションを構築するための基本的なプラクティスの詳細内容です。詳細については、PHP 中国語 Web サイトの他の関連記事を参照してください。

このウェブサイトの声明

この記事の内容はネチズンが自主的に寄稿したものであり、著作権は原著者に帰属します。このサイトは、それに相当する法的責任を負いません。盗作または侵害の疑いのあるコンテンツを見つけた場合は、admin@php.cn までご連絡ください。

ホットAIツール

Undresser.AI Undress

リアルなヌード写真を作成する AI 搭載アプリ

AI Clothes Remover

写真から衣服を削除するオンライン AI ツール。

Undress AI Tool

脱衣画像を無料で

Clothoff.io

AI衣類リムーバー

Video Face Swap

完全無料の AI 顔交換ツールを使用して、あらゆるビデオの顔を簡単に交換できます。

ホットツール

メモ帳++7.3.1

使いやすく無料のコードエディター

SublimeText3 中国語版

中国語版、とても使いやすい

ゼンドスタジオ 13.0.1

強力な PHP 統合開発環境

ドリームウィーバー CS6

ビジュアル Web 開発ツール

SublimeText3 Mac版

神レベルのコード編集ソフト（SublimeText3）

ホットトピック

Java チュートリアル

1666

CakePHP チュートリアル

1426

Laravel チュートリアル

1328

PHP チュートリアル

1273

C# チュートリアル

1253

Related knowledge

Python：ゲーム、GUIなど Apr 13, 2025 am 12:14 AM

PythonはゲームとGUI開発に優れています。 1）ゲーム開発は、2Dゲームの作成に適した図面、オーディオ、その他の機能を提供し、Pygameを使用します。 2）GUI開発は、TKINTERまたはPYQTを選択できます。 TKINTERはシンプルで使いやすく、PYQTは豊富な機能を備えており、専門能力開発に適しています。

Python vs. C：曲線と使いやすさの学習 Apr 19, 2025 am 12:20 AM

Pythonは学習と使用が簡単ですが、Cはより強力ですが複雑です。 1。Python構文は簡潔で初心者に適しています。動的なタイピングと自動メモリ管理により、使いやすくなりますが、ランタイムエラーを引き起こす可能性があります。 2.Cは、高性能アプリケーションに適した低レベルの制御と高度な機能を提供しますが、学習しきい値が高く、手動メモリとタイプの安全管理が必要です。

Pythonと時間：勉強時間を最大限に活用する Apr 14, 2025 am 12:02 AM

限られた時間でPythonの学習効率を最大化するには、PythonのDateTime、時間、およびスケジュールモジュールを使用できます。 1. DateTimeモジュールは、学習時間を記録および計画するために使用されます。 2。時間モジュールは、勉強と休息の時間を設定するのに役立ちます。 3.スケジュールモジュールは、毎週の学習タスクを自動的に配置します。

Python vs. C：パフォーマンスと効率の探索 Apr 18, 2025 am 12:20 AM

Pythonは開発効率でCよりも優れていますが、Cは実行パフォーマンスが高くなっています。 1。Pythonの簡潔な構文とリッチライブラリは、開発効率を向上させます。 2.Cのコンピレーションタイプの特性とハードウェア制御により、実行パフォーマンスが向上します。選択を行うときは、プロジェクトのニーズに基づいて開発速度と実行効率を比較検討する必要があります。

Python Standard Libraryの一部はどれですか：リストまたは配列はどれですか？ Apr 27, 2025 am 12:03 AM

PythonListSarePartOfThestAndardarenot.liestareBuilting-in、versatile、forStoringCollectionsのpythonlistarepart。

Python：自動化、スクリプト、およびタスク管理 Apr 16, 2025 am 12:14 AM

Pythonは、自動化、スクリプト、およびタスク管理に優れています。 1）自動化：OSやShutilなどの標準ライブラリを介してファイルバックアップが実現されます。 2）スクリプトの書き込み：Psutilライブラリを使用してシステムリソースを監視します。 3）タスク管理：スケジュールライブラリを使用してタスクをスケジュールします。 Pythonの使いやすさと豊富なライブラリサポートにより、これらの分野で優先ツールになります。

Pythonの学習：2時間の毎日の研究で十分ですか？ Apr 18, 2025 am 12:22 AM

Pythonを1日2時間学ぶだけで十分ですか？それはあなたの目標と学習方法に依存します。 1）明確な学習計画を策定し、2）適切な学習リソースと方法を選択します。3）実践的な実践とレビューとレビューと統合を練習および統合し、統合すると、この期間中にPythonの基本的な知識と高度な機能を徐々に習得できます。

Python vs. C：重要な違いを理解します Apr 21, 2025 am 12:18 AM

PythonとCにはそれぞれ独自の利点があり、選択はプロジェクトの要件に基づいている必要があります。 1）Pythonは、簡潔な構文と動的タイピングのため、迅速な開発とデータ処理に適しています。 2）Cは、静的なタイピングと手動メモリ管理により、高性能およびシステムプログラミングに適しています。

See all articles

堅牢な LLM アプリケーションを構築するための基本的なプラクティス

導入

1. 柔軟性のためにミドルウェアを活用する

2. 再試行メカニズムを実装する

3. LLM プロバイダーのフォールバックを設定する

4. 可観測性の実装

5. プロンプトを効果的に管理する

6. Store Conversation History Persistently

7. Use JSON Mode whenever possible

8. Set Up Credit Alerts

9. Implement Feedback Loops

10. Implement Guardrails

Conclusion

ホットAIツール

Undresser.AI Undress

AI Clothes Remover

Undress AI Tool

Clothoff.io

Video Face Swap

人気の記事

ホットツール

メモ帳++7.3.1

SublimeText3 中国語版

ゼンドスタジオ 13.0.1

ドリームウィーバー CS6

SublimeText3 Mac版

ホットトピック