Fine-Tuning LLaMA 2: A Step-by-Step Guide to Customizing the Large Language Model-AI-php.cn

Fine-Tuning LLaMA 2: A Step-by-Step Guide to Customizing the Large Language Model

William Shakespeare

Release： 2025-03-09 11:09:12

Original

188 people have browsed it

Meta's LLaMA sparked a surge in Large Language Model (LLM) development, aiming to rival models like GPT-3.5. The open-source community rapidly produced increasingly powerful models, but these advancements weren't without challenges. Many open-source LLMs had restrictive licenses (research use only), required substantial budgets for fine-tuning, and were expensive to deploy.

LLaMA's new iteration addresses these issues with a commercial license and new methods enabling fine-tuning on consumer-grade GPUs with limited memory. This democratizes AI, allowing even smaller organizations to create tailored models.

This guide demonstrates fine-tuning Llama-2 on Google Colab, utilizing efficient techniques to overcome resource constraints. We'll explore methodologies that minimize memory usage and accelerate training.

Image generated by Author using DALL-E 3

Fine-Tuning Llama-2: A Step-by-Step Guide

This tutorial fine-tunes the 7-billion parameter Llama-2 model on a T4 GPU (available on Google Colab or Kaggle). The T4's 16GB VRAM necessitates parameter-efficient fine-tuning, specifically using QLoRA (4-bit precision). We'll utilize the Hugging Face ecosystem (transformers, accelerate, peft, trl, bitsandbytes).

1. Setup:

Install necessary libraries:

<code>%%capture
%pip install accelerate peft bitsandbytes transformers trl</code>

Copy after login

Import modules:

<code>import os
import torch
from datasets import load_dataset
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    TrainingArguments,
    pipeline,
    logging,
)
from peft import LoraConfig
from trl import SFTTrainer</code>

Copy after login

2. Model & Dataset Selection:

We'll use NousResearch/Llama-2-7b-chat-hf (a readily accessible equivalent to the official Llama-2) as the base model and mlabonne/guanaco-llama2-1k as our smaller training dataset.

<code>base_model = "NousResearch/Llama-2-7b-chat-hf"
guanaco_dataset = "mlabonne/guanaco-llama2-1k"
new_model = "llama-2-7b-chat-guanaco"</code>

Copy after login

Images illustrating the Hugging Face model and dataset are included here, same as original.

3. Loading Data & Model:

Load the dataset:

<code>dataset = load_dataset(guanaco_dataset, split="train")</code>

Copy after login

Configure 4-bit quantization using QLoRA:

<code>compute_dtype = getattr(torch, "float16")
quant_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=compute_dtype,
    bnb_4bit_use_double_quant=False,
)</code>

Copy after login

Load the Llama-2 model with 4-bit quantization:

<code>model = AutoModelForCausalLM.from_pretrained(
    base_model,
    quantization_config=quant_config,
    device_map={"": 0}
)
model.config.use_cache = False
model.config.pretraining_tp = 1</code>

Copy after login

Load the tokenizer:

<code>tokenizer = AutoTokenizer.from_pretrained(base_model, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"</code>

Copy after login

Image illustrating QLoRA is included here, same as original.

4. PEFT Configuration:

Define PEFT parameters for efficient fine-tuning:

<code>peft_params = LoraConfig(
    lora_alpha=16,
    lora_dropout=0.1,
    r=64,
    bias="none",
    task_type="CAUSAL_LM",
)</code>

Copy after login

5. Training Parameters:

Set training hyperparameters (output directory, epochs, batch sizes, learning rate, etc.). Details are the same as the original.

6. Fine-tuning with SFT:

Use the SFTTrainer from the TRL library for supervised fine-tuning:

<code>trainer = SFTTrainer(
    model=model,
    train_dataset=dataset,
    peft_config=peft_params,
    dataset_text_field="text",
    max_seq_length=None,
    tokenizer=tokenizer,
    args=training_params,
    packing=False,
)

trainer.train()
trainer.model.save_pretrained(new_model)
trainer.tokenizer.save_pretrained(new_model)</code>

Copy after login

Screenshots showing training progress and model saving are included here, same as original.

7. Evaluation:

Use the transformers pipeline to test the fine-tuned model. Examples are provided, same as original.

8. Tensorboard Visualization:

Launch Tensorboard to monitor training metrics.

<code>%%capture
%pip install accelerate peft bitsandbytes transformers trl</code>

Copy after login

Screenshot of Tensorboard is included here, same as original.

Conclusion:

This guide showcases efficient Llama-2 fine-tuning on limited hardware. The use of QLoRA and other techniques makes advanced LLMs accessible to a wider audience. Further resources and learning paths are mentioned at the end, similar to the original, but without the marketing calls to action.

The above is the detailed content of Fine-Tuning LLaMA 2: A Step-by-Step Guide to Customizing the Large Language Model. For more information, please follow other related articles on the PHP Chinese website!