使用 .NET Core、Python 和 Azure 微調大型語言模型 (LLM)-Python教學-PHP中文網

Fine-Tuning Large Language Models (LLMs) with .NET Core, Python, and Azure

大型語言模型 (LLM) 因其理解和產生類似人類文本的能力而受到廣泛關注。然而，許多組織擁有獨特的、特定領域的資料集和詞彙表，通用模型可能無法完全捕捉到這些資料集和詞彙表。微調使開發人員能夠根據特定環境或產業調整這些大型模型，從而提高準確性和相關性。

本文將探討如何使用Python 微調LLM，然後將產生的模型整合並部署到.NET Core C# 應用程式中，所有這些都在Microsoft Azure 上完成，以實現可擴展性和便捷性。

為什麼微調大型語言模型？
領域專用性: 可以微調 LLM 以使用特定行業的術語、產品名稱或專業術語。
效能提升: 微調通常會減少錯誤並在客戶服務、研究和分析等用例中提高相關性。
降低成本: 無需從頭開始建立模型，您可以自訂現有的強大 LLM。
提高效率: 您利用預訓練權重，只調整最終層或參數，從而加快流程。

解決方案概述

組件與技術

用於微調的 Python
- 常用函式庫（例如，Hugging Face Transformers、PyTorch）
- 簡化了載入和調整預訓練模型的過程
用於整合的 .NET Core C#
- 公開微調模型的後端服務或 API
- 強型別語言，許多企業開發人員都很熟悉
Azure 服務
- Azure 機器學習 用於訓練和模型管理
- Azure 儲存 用於資料和模型工件
- Azure 應用服務 或 Azure 函數 用於託管 .NET Core 應用程式
- Azure 金鑰保管庫（可選）用於保護憑證

環境設定

先決條件

Azure 訂閱: 需要建立機器學習工作區和應用程式服務等資源。
Python 3.8 : 在本地安裝，用於模型微調。
.NET 6/7/8 SDK: 用於建立和執行 .NET Core C# 應用程式。
Visual Studio 2022 或 Visual Studio Code: 建議使用的 IDE。
Azure CLI: 用於透過終端設定和管理 Azure 服務。
Docker（可選）：如果需要，可用於容器化您的應用程式。

使用 Python 進行訓練和微調

此範例使用 Hugging Face Transformers－這是最廣泛採用的 LLM 微調函式庫之一。

5.1 設定虛擬環境

<code>python -m venv venv
source venv/bin/activate  # 在 Windows 上：venv\Scripts\activate</code>

登入後複製

5.2 安裝依賴項

<code>pip install torch transformers azureml-sdk</code>

登入後複製

5.3 建立 Azure 機器學習工作區

資源組 和 工作區:

<code>   az group create --name LLMFinetuneRG --location eastus
   az ml workspace create --name LLMFinetuneWS --resource-group LLMFinetuneRG</code>

登入後複製

配置本機環境以連接到工作區（使用 config.json 檔案或環境變數）。

5.4 微調腳本 (train.py)

<code>import os
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments, Trainer
from azureml.core import Workspace, Run

# 连接到 Azure ML
ws = Workspace.from_config()
run = Run.get_context()

model_name = "gpt2"  # 示例模型
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

# 加载自定义数据集（本地或来自 Azure 存储）
# 示例：Azure ML 中的文本文件或数据集
train_texts = ["此处输入您的特定领域文本..."]  # 简化版
train_encodings = tokenizer(train_texts, truncation=True, padding=True)

class CustomDataset(torch.utils.data.Dataset):
    def __init__(self, encodings):
        self.encodings = encodings
    def __len__(self):
        return len(self.encodings["input_ids"])
    def __getitem__(self, idx):
        return {k: torch.tensor(v[idx]) for k, v in self.encodings.items()}

train_dataset = CustomDataset(train_encodings)

training_args = TrainingArguments(
    output_dir="./results",
    num_train_epochs=3,
    per_device_train_batch_size=2,
    save_steps=100,
    logging_steps=100
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
)

trainer.train()

# 保存微调后的模型
trainer.save_model("./fine_tuned_model")
tokenizer.save_pretrained("./fine_tuned_model")</code>

登入後複製

5.5 在 Azure 中註冊模型

<code>from azureml.core.model import Model

model = Model.register(
    workspace=ws,
    model_path="./fine_tuned_model",
    model_name="myFineTunedLLM"
)</code>

登入後複製

此時，您的微調模型已儲存在 Azure 機器學習中，方便存取和版本控制。

在 .NET Core 整合微調後的模型

6.1 建立 .NET Core Web API 專案

<code>dotnet new webapi -n FineTunedLLMApi
cd FineTunedLLMApi</code>

登入後複製

6.2 新增依賴項

HttpClient 用於呼叫 Azure 端點或本機推理 API
Newtonsoft.Json（如果您喜歡使用 JSON.NET 進行序列化）
Azure.Storage.Blobs 或 Azure.Identity 用於安全存取 Azure 資源

<code>dotnet add package Microsoft.Extensions.Http
dotnet add package Microsoft.Azure.Storage.Blob
dotnet add package Newtonsoft.Json</code>

登入後複製

6.3 ModelConsumerService.cs

假設您已將微調後的模型部署為 Web 服務（例如，使用 Azure 容器執行個體或 Azure ML 中的自訂端點）。以下程式碼段呼叫該服務以取得完成結果。

<code>using Newtonsoft.Json;
using System.Net.Http;
using System.Text;
using System.Threading.Tasks;

public class ModelConsumerService
{
    private readonly HttpClient _httpClient;

    public ModelConsumerService(IHttpClientFactory httpClientFactory)
    {
        _httpClient = httpClientFactory.CreateClient("FineTunedModel");
    }

    public async Task<string> GetCompletionAsync(string prompt)
    {
        var requestBody = new { prompt = prompt };
        var content = new StringContent(
            JsonConvert.SerializeObject(requestBody),
            Encoding.UTF8, 
            "application/json");

        var response = await _httpClient.PostAsync("/predict", content);
        response.EnsureSuccessStatusCode();
        return await response.Content.ReadAsStringAsync();
    }
}</code>

登入後複製

6.4 LLMController.cs

<code>using Microsoft.AspNetCore.Mvc;
using System.Threading.Tasks;

[ApiController]
[Route("[controller]")]
public class LLMController : ControllerBase
{
    private readonly ModelConsumerService _modelService;

    public LLMController(ModelConsumerService modelService)
    {
        _modelService = modelService;
    }

    [HttpPost("complete")]
    public async Task<IActionResult> CompletePrompt([FromBody] PromptRequest request)
    {
        var result = await _modelService.GetCompletionAsync(request.Prompt);
        return Ok(new { Completion = result });
    }
}

public class PromptRequest
{
    public string Prompt { get; set; }
}</code>

登入後複製

6.5 設定 .NET Core 應用程式

在 Program.cs 或 Startup.cs 中：

<code>var builder = WebApplication.CreateBuilder(args);

// 注册 HttpClient
builder.Services.AddHttpClient("FineTunedModel", client =>
{
    client.BaseAddress = new Uri("https://your-model-endpoint/");
});

// 注册 ModelConsumerService
builder.Services.AddTransient<ModelConsumerService>();

builder.Services.AddControllers();
var app = builder.Build();

app.MapControllers();
app.Run();</code>

登入後複製

部署到 Azure
Azure 應用服務:
- 對於許多 .NET Core 應用程式來說，這是最簡單的途徑。
- 從 Azure 入口網站或透過 CLI 建立新的 Web 應用程式。

<code>python -m venv venv
source venv/bin/activate  # 在 Windows 上：venv\Scripts\activate</code>

登入後複製

Azure 函數（可選）：
- 如果您的使用是間歇性的或計劃性的，則非常適合運行無伺服器、事件驅動的邏輯。
Azure Kubernetes 服務 (AKS)（進階）：
- 非常適合大規模部署。
- 使用 Docker 容器化您的應用程式並將其推送到 Azure 容器註冊表 (ACR)。

最佳實踐
資料隱私: 確保負責任地處理敏感或專有數據，尤其是在模型訓練期間。
監控和日誌記錄: 整合 Azure Application Insights 以監控效能、追蹤使用情況並偵測異常。
安全性: 使用 Azure 金鑰保管庫 來儲存金鑰（API 金鑰、連接字串）。
模型版本控制: 追蹤 Azure ML 中不同微調版本的模型；如果需要，回滾到舊版本。
提示工程: 完善您的提示以從微調後的模型中獲得最佳結果。

結論

使用Python 和Azure 機器學習 微調LLM，然後將它們整合到.NET Core 應用程式中，使您可以建立強大的特定領域AI 解決方案。對於尋求利用 Python 的 AI 生態系統和 .NET 的企業功能的組織來說，這種組合是一個極好的選擇，所有這些都由 Azure 的可擴展性提供支援。

透過仔細規劃安全、資料治理和 DevOps，您可以推出一個滿足現實世界需求的生產就緒型解決方案，在強大且易於維護的框架中提供準確的特定領域語言功能。

以上是使用 .NET Core、Python 和 Azure 微調大型語言模型 (LLM)的詳細內容。更多資訊請關注PHP中文網其他相關文章！