微調整DeepSeek R1（推論モデル）-AI-php.cn

Deepseekの画期的なAIモデルは、Openaiの優位性に挑戦します。これらの高度な推論モデルは自由に利用可能であり、強力なAIへのアクセスを民主化します。ビデオチュートリアル：

でdeepseekを微調整する方法を学びます

このチュートリアルは、抱きしめる顔の医療連鎖データセットを使用して、deepseek-r1-distill-lama-8bモデルを微調整します。 Llama 3.1 8bに由来するこの蒸留モデルは、元のDeepSeek-R1に匹敵する推論機能を提供します。 LLMSと微調整は初めてですか？ PythonコースでのLLMSの紹介を検討してください

Fine-Tuning DeepSeek R1 (Reasoning Model)

著者による画像

deepseek R1モデルの導入

DeepSeek AIには、オープンソースのDeepSeek-R1とDeepSeek-R1-Zeroがあり、OpenaiのO1に匹敵するタスク（数学、コーディング、ロジック）に匹敵します。詳細については、包括的なdeepseek R1ガイドをご覧ください。

deepseek-r1-zero

この先駆的なモデルは、大規模な補強学習（RL）を使用して、初期の監視微調整（SFT）をバイパスします。独立した考え方（COT）の推論を可能にしながら、繰り返しの推論や読みやすさの問題などの課題を提示します。

deepseek-r1

DeepSeek-R1-Zeroの制限に対処する

deepseek-r1には、RLの前にコールドスタートデータが組み込まれています。このマルチステージトレーニングは、最先端のパフォーマンスを実現し、Openai-O1を一致させながら、出力の明確さを強化します。

deepseek蒸留

Deepseekは、蒸留モデルも提供し、パワーと効率のバランスをとっています。これらの小さなモデル（1.5Bから70Bパラメーター）は、ベンチマークでOpenAI-O1-MINIを上回るDeepSeek-R1-Distill-Qwen-32Bを使用して、強力な推論を保持しています。これは、蒸留プロセスの有効性を強調しています

出典：deepseek-ai/deepseek-r1

ブログ投稿のDeepSeek-R1の機能、開発、蒸留モデル、アクセス、価格、Openai O1の比較については、「Deepseek-R1：機能、O1比較、蒸留モデルなど」。 Fine-Tuning DeepSeek R1 (Reasoning Model) 微調整Deepseek R1：実用的なガイド

これらの手順に従って、deepseek R1モデルを微調整します：

1。セットアップ

Kaggleの無料GPUアクセスを利用しています。 Kaggleノートブックを作成し、抱きしめる顔と重量とバイアストークンを秘密として追加します。より高速でよりメモリ効率の高い微調整を実現するために、

Pythonパッケージをインストールします。詳細については、「Unsloth Guide：LLM Fine-Tuningを最適化してスピードアップする」を参照してください。

<code>%%capture
!pip install unsloth
!pip install --force-reinstall --no-cache-dir --no-deps git+https://github.com/unslothai/unsloth.git</code>

ログイン後にコピー

抱きしめる顔のCLIと重量とバイアス（WANDB）で認証：

<code>from huggingface_hub import login
from kaggle_secrets import UserSecretsClient
user_secrets = UserSecretsClient()

hf_token = user_secrets.get_secret("HUGGINGFACE_TOKEN")
login(hf_token)

import wandb

wb_token = user_secrets.get_secret("wandb")

wandb.login(key=wb_token)
run = wandb.init(
    project='Fine-tune-DeepSeek-R1-Distill-Llama-8B on Medical COT Dataset', 
    job_type="training", 
    anonymous="allow"
)</code>

ログイン後にコピー

2。モデルとトークン剤のロード

最適化されたパフォーマンスのために4ビットの量子化を使用して、deepseek-r1-distill-llama-8bのアンソロスバージョンをロードします：

<code>from unsloth import FastLanguageModel

max_seq_length = 2048 
dtype = None 
load_in_4bit = True


model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/DeepSeek-R1-Distill-Llama-8B",
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
    token = hf_token, 
)</code>

ログイン後にコピー

3。事前に調整された推論

質問と回答のためにプレースホルダーで迅速なスタイルを定義します。これは、モデルの段階的な推論をガイドします：

<code>prompt_style = """Below is an instruction that describes a task, paired with an input that provides further context. 
Write a response that appropriately completes the request. 
Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.

### Instruction:
You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning. 
Please answer the following medical question. 

### Question:
{}

### Response:
<think>{}"""</think></code>

ログイン後にコピー

サンプルの医療質問でモデルをテストします：

<code>question = "A 61-year-old woman with a long history of involuntary urine loss during activities like coughing or sneezing but no leakage at night undergoes a gynecological exam and Q-tip test. Based on these findings, what would cystometry most likely reveal about her residual volume and detrusor contractions?"


FastLanguageModel.for_inference(model) 
inputs = tokenizer([prompt_style.format(question, "")], return_tensors="pt").to("cuda")

outputs = model.generate(
    input_ids=inputs.input_ids,
    attention_mask=inputs.attention_mask,
    max_new_tokens=1200,
    use_cache=True,
)
response = tokenizer.batch_decode(outputs)
print(response[0].split("### Response:")[1])</code>

ログイン後にコピー

モデルの事前に調整された推論を観察し、微調整を通じて改善の領域を特定します。

4。データセットの読み込みと処理

迅速なスタイルを変更して、複雑な思考チェーンのプレースホルダーを含める：

<code>train_prompt_style = """Below is an instruction that describes a task, paired with an input that provides further context. 
Write a response that appropriately completes the request. 
Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.

### Instruction:
You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning. 
Please answer the following medical question. 

### Question:
{}

### Response:
<think>
{}
</think>
{}"""</code>

ログイン後にコピー

データセットをフォーマットする関数を作成します：

<code>EOS_TOKEN = tokenizer.eos_token  # Must add EOS_TOKEN


def formatting_prompts_func(examples):
    inputs = examples["Question"]
    cots = examples["Complex_CoT"]
    outputs = examples["Response"]
    texts = []
    for input, cot, output in zip(inputs, cots, outputs):
        text = train_prompt_style.format(input, cot, output) + EOS_TOKEN
        texts.append(text)
    return {
        "text": texts,
    }</code>

ログイン後にコピー

データセットをロードして処理します：

<code>from datasets import load_dataset
dataset = load_dataset("FreedomIntelligence/medical-o1-reasoning-SFT","en", split = "train[0:500]",trust_remote_code=True)
dataset = dataset.map(formatting_prompts_func, batched = True,)
dataset["text"][0]</code>

ログイン後にコピー

5。モデルのセットアップ

LORAを使用してモデルを構成します：

トレーナーをセットアップします：

<code>model = FastLanguageModel.get_peft_model(
    model,
    r=16,  
    target_modules=[
        "q_proj",
        "k_proj",
        "v_proj",
        "o_proj",
        "gate_proj",
        "up_proj",
        "down_proj",
    ],
    lora_alpha=16,
    lora_dropout=0,  
    bias="none",  
    use_gradient_checkpointing="unsloth",  # True or "unsloth" for very long context
    random_state=3407,
    use_rslora=False,  
    loftq_config=None,
)</code>

ログイン後にコピー

6。モデルトレーニング

<code>from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported

trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=dataset,
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    dataset_num_proc=2,
    args=TrainingArguments(
        per_device_train_batch_size=2,
        gradient_accumulation_steps=4,
        # Use num_train_epochs = 1, warmup_ratio for full training runs!
        warmup_steps=5,
        max_steps=60,
        learning_rate=2e-4,
        fp16=not is_bfloat16_supported(),
        bf16=is_bfloat16_supported(),
        logging_steps=10,
        optim="adamw_8bit",
        weight_decay=0.01,
        lr_scheduler_type="linear",
        seed=3407,
        output_,
    ),
)</code>

ログイン後にコピー

モデルのトレーニング：

（注：元の応答には、トレーニングの進捗状況の画像が含まれていました。これらは、画像の再現が不可能であるため、ここでは省略されています。

7。ポストファインチューニング推論

<code>trainer_stats = trainer.train()</code>

ログイン後にコピー

以前と同じ質問で微調整されたモデルをクエリすることにより、結果を比較します。推論と応答の簡潔さの改善を観察します

（注：元の応答には、改善されたモデル出力が含まれていました。これは、簡潔にするためにここで省略されています。

8。モデルの保存とプッシュ

モデルをローカルに保存して、抱きしめている顔のハブに押します：

（注：元の応答には、モデルの保存とプッシュの成功を示す画像が含まれています。これらはここで省略されています。

9。展開と結論

チュートリアルは、BENTOMLまたはGGUF形式へのローカル変換を使用した展開オプションを提案することで終了します。オープンソースのLLMSの重要性の高まりを強調し、O3およびオペレーターAIを使用したOpenaiのカウンターモーブを強調しています。これらのリソースへのリンクは保存されています。

<code>new_model_local = "DeepSeek-R1-Medical-COT"
model.save_pretrained(new_model_local) 
tokenizer.save_pretrained(new_model_local)

model.save_pretrained_merged(new_model_local, tokenizer, save_method = "merged_16bit",)

new_model_online = "kingabzpro/DeepSeek-R1-Medical-COT"
model.push_to_hub(new_model_online)
tokenizer.push_to_hub(new_model_online)

model.push_to_hub_merged(new_model_online, tokenizer, save_method = "merged_16bit")</code>

ログイン後にコピー

書き直された応答は、構造を簡素化し、不必要な繰り返しを削除しながら、コア情報を維持します。コードブロックは完全性のために保持されます。画像は参照されますが、再現されていません

以上が微調整DeepSeek R1（推論モデル）の詳細内容です。詳細については、PHP 中国語 Web サイトの他の関連記事を参照してください。

微調整DeepSeek R1（推論モデル）

DeepSeek AIには、オープンソースのDeepSeek-R1とDeepSeek-R1-Zeroがあり、OpenaiのO1に匹敵するタスク（数学、コーディング、ロジック）に匹敵します。 詳細については、包括的なdeepseek R1ガイドをご覧ください。

この先駆的なモデルは、大規模な補強学習（RL）を使用して、初期の監視微調整（SFT）をバイパスします。 独立した考え方（COT）の推論を可能にしながら、繰り返しの推論や読みやすさの問題などの課題を提示します。

deepseek蒸留

Kaggleの無料GPUアクセスを利用しています。 Kaggleノートブックを作成し、抱きしめる顔と重量とバイアストークンを秘密として追加します。より高速でよりメモリ効率の高い微調整を実現するために、

2。モデルとトークン剤のロード

3。事前に調整された推論

LORAを使用してモデルを構成します：

8。モデルの保存とプッシュ

9。展開と結論

DeepSeek AIには、オープンソースのDeepSeek-R1とDeepSeek-R1-Zeroがあり、OpenaiのO1に匹敵するタスク（数学、コーディング、ロジック）に匹敵します。詳細については、包括的なdeepseek R1ガイドをご覧ください。

この先駆的なモデルは、大規模な補強学習（RL）を使用して、初期の監視微調整（SFT）をバイパスします。独立した考え方（COT）の推論を可能にしながら、繰り返しの推論や読みやすさの問題などの課題を提示します。