建構穩健的法學碩士申請的基本實踐-Python教學-PHP中文網

Essential Practices for Building Robust LLM Applications

介紹

我一直在雲端建立 LLM 應用程式。我還看到很多開發人員製作 LLM 應用程序，這對於 MVP 或原型來說非常好，但需要一些工作才能使其做好生產準備。應用程式所列出的一種或多種實踐可以幫助您的應用程式以有效的方式擴展。本文不涵蓋應用程式開發的整個軟體工程方面，而僅涵蓋 LLM 包裝應用程式。此外，程式碼片段是用 python 編寫的，相同的邏輯也可以應用於其他語言。

1. 利用中間件實現彈性

使用 LiteLLM 或 LangChain 等中間件來避免供應商鎖定並隨著模型的發展在模型之間輕鬆切換。

Python：

from litellm import completion

response = completion(
    model="gpt-3.5-turbo", 
    messages=[{"role": "user", "content": "Hello, how are you?"}]
)

登入後複製

諸如LiteLLM和LangChain之類的中間件解決方案在您的應用程式和各種LLM提供者之間提供了一個抽象層。這種抽象化允許您輕鬆地在不同模型或提供者之間切換，而無需更改核心應用程式程式碼。隨著人工智慧領域的快速發展，不斷發布具有改進功能的新模型。透過使用中間件，您可以根據效能、成本或功能要求快速採用這些新模型或切換供應商，確保您的應用程式保持最新狀態並具有競爭力。

2. 實施重試機制

透過在 API 呼叫中實作重試邏輯來避免速率限制問題。

Python：

import time
from openai import OpenAI

client = OpenAI()

def retry_api_call(max_retries=3, delay=1):
    for attempt in range(max_retries):
        try:
            response = client.chat.completions.create(
                model="gpt-3.5-turbo",
                messages=[{"role": "user", "content": "Hello!"}]
            )
            return response
        except Exception as e:
            if attempt == max_retries - 1:
                raise e
            time.sleep(delay * (2 ** attempt))  # Exponential backoff

登入後複製

LLM 提供者經常施加費率限制，以防止濫用並確保公平使用。實施具有指數退避的重試機制可以幫助您的應用程式優雅地處理臨時故障或速率限制錯誤。這種方法透過自動重試失敗的請求來提高應用程式的可靠性，減少因暫時性問題而導致服務中斷的可能性。指數退避策略（增加重試之間的延遲）有助於防止立即重新請求使 API 不堪重負，這可能會加劇速率限制問題。

3. 設定 LLM 提供者後備

不要依賴單一的法學碩士提供者。實施後備措施來處理配額問題或服務中斷。

from litellm import completion

def get_llm_response(prompt):
    providers = ['openai/gpt-3.5-turbo', 'anthropic/claude-2', 'cohere/command-nightly']
    for provider in providers:
        try:
            response = completion(model=provider, messages=[{"role": "user", "content": prompt}])
            return response
        except Exception as e:
            print(f"Error with {provider}: {str(e)}")
            continue
    raise Exception("All LLM providers failed")

登入後複製

如果該提供者遇到停機或達到配額限制，則依賴單一 LLM 提供者可能會導致服務中斷。透過實施後備選項，您可以確保應用程式的持續運作。這種方法還允許您利用不同提供者或模型的優勢來完成各種任務。 LiteLLM 透過為多個提供者提供統一的介面來簡化此流程，使其更容易在它們之間切換或實現回退邏輯。

4. 實施可觀察性

使用 Langfuse 或 Helicone 等工具進行 LLM 追蹤和除錯。

from langfuse.openai import OpenAI

client = OpenAI(
    api_key="your-openai-api-key",
    langfuse_public_key="your-langfuse-public-key",
    langfuse_secret_key="your-langfuse-secret-key"
)

response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[{"role": "user", "content": "Hello, AI!"}]
)

登入後複製

實現可觀察性的優點：

增強調試：輕鬆追蹤和重播對話以識別問題。
效能最佳化：深入了解反應時間和模型效能。
成本管理：追蹤代幣使用情況和相關成本，以便更好地控制預算。
品質保證：監控回應的品質並確定需要改進的領域。
使用者體驗分析：了解使用者互動並相應最佳化提示。
合規性和審計：維護法規合規性和內部審計的日誌。
異常檢測：快速識別異常模式或行為並做出回應。

可觀測性工具為您的 LLM 應用程式的效能、使用模式和潛在問題提供重要的見解。它們允許您即時監控和分析與法學碩士的交互，幫助您優化提示、識別瓶頸並確保人工智慧生成的回應的品質。隨著時間的推移，這種可見性等級對於維護、偵錯和改進應用程式至關重要。

5. 有效管理提示

使用具有版本控制的提示管理工具，而不是在程式碼或文字檔案中硬編碼提示。

from promptflow import PromptFlow

pf = PromptFlow()

prompt_template = pf.get_prompt("greeting_prompt", version="1.2")
response = openai.ChatCompletion.create(
    model="gpt-3.5-turbo",
    messages=[{"role": "user", "content": prompt_template.format(name="Alice")}]
)

登入後複製

有效的及時管理對於維護和改進您的 LLM 申請至關重要。透過使用專用的提示管理工具，您可以對提示進行版本控制、A/B 測試不同的變體，並在您的應用程式中輕鬆更新它們。這種方法將提示邏輯與應用程式程式碼分開，使得在不更改核心應用程式的情況下更容易迭代提示。它還使非技術團隊成員能夠為快速改進做出貢獻，並在改進人工智慧互動方面實現更好的協作。

6. Store Conversation History Persistently

Use a persistent cache like Redis for storing conversation history instead of in-memory cache which is not adapted for distributed systems.

from langchain.memory import RedisChatMessageHistory
from langchain.chains import ConversationChain
from langchain.llms import OpenAI

# Initialize Redis chat message history
message_history = RedisChatMessageHistory(url="redis://localhost:6379/0", ttl=600, session_id="user-123")

# Create a conversation chain with Redis memory
conversation = ConversationChain(
    llm=OpenAI(),
    memory=message_history,
    verbose=True
)

# Use the conversation
response = conversation.predict(input="Hi there!")
print(response)

# The conversation history is automatically stored in Redis

登入後複製

Storing conversation history is essential for maintaining context in ongoing interactions and providing personalized experiences. Using a persistent cache like Redis, especially in distributed systems, ensures that conversation history is reliably stored and quickly accessible. This approach allows your application to scale horizontally while maintaining consistent user experiences across different instances or servers. The use of Redis with LangChain simplifies the integration of persistent memory into your conversational AI system, making it easier to build stateful, context-aware applications.

7. Use JSON Mode whenever possible

Whenever possible like extracting structured information, provide a JSON schema instead of relying on raw text output.

import openai

response = openai.ChatCompletion.create(
    model="gpt-3.5-turbo-1106",
    response_format={"type": "json_object"},
    messages=[
        {"role": "system", "content": "Extract the name and age from the user's input."},
        {"role": "user", "content": "My name is John and I'm 30 years old."}
    ]
)

print(response.choices[0].message.content)
# Output: {"name": "John", "age": 30}

登入後複製

Using JSON mode for information extraction provides a structured and consistent output format, making it easier to parse and process the LLM's responses in your application. This approach reduces the need for complex post-processing of free-form text and minimizes the risk of misinterpretation. It's particularly useful for tasks like form filling, data extraction from unstructured text, or any scenario where you need to integrate AI-generated content into existing data structures or databases.

8. Set Up Credit Alerts

Implement alerts for prepaid credits and per-user credit checks, even in MVP stages.

def check_user_credits(user_id, requested_tokens):
    user_credits = get_user_credits(user_id)
    if user_credits < requested_tokens:
        raise InsufficientCreditsError(f"User {user_id} has insufficient credits")

    remaining_credits = user_credits - requested_tokens
    if remaining_credits < CREDIT_ALERT_THRESHOLD:
        send_low_credit_alert(user_id, remaining_credits)

    return True

登入後複製

Implementing credit alerts and per-user credit checks is crucial for managing costs and ensuring fair usage in your LLM application. This system helps prevent unexpected expenses and allows you to proactively manage user access based on their credit limits. By setting up alerts at multiple thresholds, you can inform users or administrators before credits are depleted, ensuring uninterrupted service. This approach is valuable even in MVP stages, as it helps you understand usage patterns and plan for scaling your application effectively.

9. Implement Feedback Loops

Create mechanisms for users to provide feedback on AI responses, starting with simple thumbs up/down ratings.

def process_user_feedback(response_id, feedback):
    if feedback == 'thumbs_up':
        log_positive_feedback(response_id)
    elif feedback == 'thumbs_down':
        log_negative_feedback(response_id)
        trigger_improvement_workflow(response_id)

# In your API endpoint
@app.route('/feedback', methods=['POST'])
def submit_feedback():
    data = request.json
    process_user_feedback(data['response_id'], data['feedback'])
    return jsonify({"status": "Feedback received"})

登入後複製

Implementing feedback loops is essential for continuously improving your LLM application. By allowing users to provide feedback on AI responses, you can identify areas where the model performs well and where it needs improvement. This data can be used to fine-tune models, adjust prompts, or implement additional safeguards. Starting with simple thumbs up/down ratings provides an easy way for users to give feedback, while more detailed feedback options can be added later for deeper insights. This approach helps in building trust with users and demonstrates your commitment to improving the AI's performance based on real-world usage.

10. Implement Guardrails

Use prompt guards to check for prompt injection attacks, toxic content, and off-topic responses.

import re
from better_profanity import profanity

def check_prompt_injection(input_text):
    injection_patterns = [
        r"ignore previous instructions",
        r"disregard all prior commands",
        r"override system prompt"
    ]
    for pattern in injection_patterns:
        if re.search(pattern, input_text, re.IGNORECASE):
            return True
    return False

def check_toxic_content(input_text):
    return profanity.contains_profanity(input_text)

def sanitize_input(input_text):
    if check_prompt_injection(input_text):
        raise ValueError("Potential prompt injection detected")

    if check_toxic_content(input_text):
        raise ValueError("Toxic content detected")

    # Additional checks can be added here (e.g., off-topic detection)

    return input_text  # Return sanitized input if all checks pass

# Usage
try:
    safe_input = sanitize_input(user_input)
    # Process safe_input with your LLM
except ValueError as e:
    print(f"Input rejected: {str(e)}")

登入後複製

Implementing guardrails is crucial for ensuring the safety and reliability of your LLM application. This example demonstrates how to check for potential prompt injection attacks and toxic content. Prompt injection attacks attempt to override or bypass the system's intended behavior, while toxic content checks help maintain a safe and respectful environment. By implementing these checks, you can prevent malicious use of your AI system and ensure that the content generated aligns with your application's guidelines and ethical standards. Additional checks can be added to detect off-topic responses or other unwanted behaviors, further enhancing the robustness of your application.

Conclusion

All the above listed points can be easily integrated into your application and they prepare you better for scaling in production. You may also agree or disagree on some of the above points. In any case, feel free to post your questions or comments.

以上是建構穩健的法學碩士申請的基本實踐的詳細內容。更多資訊請關注PHP中文網其他相關文章！