Fine-tuning Llama 3.2 and Using It Locally: A Step-by-Step Guide-AI-php.cn

The landscape of large language models (LLMs) is rapidly evolving, with a focus on smaller, more efficient models. Llama 3.2, with its lightweight and vision model variations, exemplifies this trend. This tutorial details how to leverage Llama 3.2's capabilities, specifically the 3B lightweight model, for fine-tuning on a customer support dataset and subsequent local deployment using the Jan application.

Before diving in, beginners are strongly encouraged to complete an AI fundamentals course to grasp the basics of LLMs and generative AI.

Fine-tuning Llama 3.2 and Using It Locally: A Step-by-Step Guide

Image by Author

Exploring Llama 3.2 Models

Llama 3.2 offers two model families: lightweight and vision. Lightweight models excel at multilingual text generation and tool use, ideal for resource-constrained environments. Vision models, on the other hand, specialize in image reasoning and multimodal tasks.

Lightweight Models

The lightweight family includes 1B and 3B parameter variants. Their compact size allows for on-device processing, ensuring data privacy and fast, cost-effective text generation. These models utilize pruning and knowledge distillation for efficiency and performance. The 3B model surpasses competitors like Gemma 2 and Phi 3.5-mini in tasks such as instruction following and summarization.

Fine-tuning Llama 3.2 and Using It Locally: A Step-by-Step Guide

Source: Llama 3.2: Revolutionizing edge AI and vision with open, customizable models

Vision Models

The vision models (11B and 90B parameters) are designed for image reasoning, capable of interpreting documents and charts. Their multimodal capabilities stem from integrating pre-trained image encoders with language models. They outperform Claude 3 Haiku and GPT-4o mini in visual understanding tasks.

Fine-tuning Llama 3.2 and Using It Locally: A Step-by-Step Guide

Source: Llama 3.2: Revolutionizing edge AI and vision with open, customizable models

For deeper insights into Llama 3.2's architecture, benchmarks, and security features (Llama Guard 3), refer to the official Llama 3.2 Guide.

Accessing Llama 3.2 on Kaggle

While Llama 3.2 is open-source, access requires accepting terms and conditions. Here's how to access it via Kaggle:

Visit llama.com, complete the access form, selecting both lightweight and vision models.
Navigate to the Meta | Llama 3.2 model page on Kaggle and submit the form.
Accept the terms and conditions.
Await the notebook creation option. Select the Transformers tab, choose your model variant, and create a new notebook.
Configure the accelerator to "GPU T4 x2".
Update the transformers and accelerate packages using %pip install -U transformers accelerate.

The subsequent steps involve loading the tokenizer and model using the transformers library, specifying the local model directory, setting pad_token_id, creating a text generation pipeline, and running inference with custom prompts. Detailed code examples are provided in the accompanying Kaggle notebook. Similar steps apply to accessing the Llama 3.2 Vision models, though GPU requirements are significantly higher.

Fine-tuning Llama 3.2 3B Instruct

This section guides you through fine-tuning the Llama 3.2 3B Instruct model on a customer support dataset using the transformers library and QLoRA for efficient training.

Setup

Launch a new Kaggle notebook and set environment variables for Hugging Face and Weights & Biases (WandB) access.
Install necessary packages: transformers, datasets, accelerate, peft, trl, bitsandbytes, and wandb.
Log in to Hugging Face and WandB using your API keys.
Define variables for the base model, new model name, and dataset name.

Loading the Model and Tokenizer

Determine the appropriate torch_dtype and attn_implementation based on your GPU capabilities.
Load the model using BitsAndBytesConfig for 4-bit quantization to minimize memory usage.
Load the tokenizer.

Loading and Processing the Dataset

Load the bitext/Bitext-customer-support-llm-chatbot-training-dataset.
Shuffle and select a subset of the data (e.g., 1000 samples for faster training).
Create a "text" column by combining system instructions, user queries, and assistant responses into a chat format using the tokenizer's apply_chat_template method.

Setting up the Model

Identify all linear module names using a helper function.
Configure LoRA using LoraConfig to fine-tune only specific modules.
Set up the TrainingArguments with appropriate hyperparameters for efficient training on Kaggle.
Create an SFTTrainer instance, providing the model, dataset, LoRA config, training arguments, and tokenizer.

Model Training

Train the model using trainer.train(). Monitor training and validation loss using WandB.

Model Inference

Test the fine-tuned model with sample prompts from the dataset.

Saving the Model

Save the fine-tuned model locally and push it to the Hugging Face Hub.

Merging and Exporting the Fine-tuned Model

This section details merging the fine-tuned LoRA adapter with the base model and exporting it to the Hugging Face Hub. It involves loading the base model and the LoRA adapter, merging them using PeftModel.from_pretrained and model.merge_and_unload(), and then saving and pushing the merged model to the Hub.

Converting to GGUF and Local Deployment

Finally, the tutorial explains converting the merged model to the GGUF format using the GGUF My Repo tool on Hugging Face and deploying it locally using the Jan application. This involves downloading the GGUF file, importing it into Jan, and setting up the system prompt and stop tokens for optimal performance.

Conclusion

Fine-tuning smaller LLMs offers a cost-effective and efficient approach to customizing models for specific tasks. This tutorial provides a practical guide to leveraging Llama 3.2's capabilities, from access and fine-tuning to local deployment, empowering users to build and deploy custom AI solutions. Remember to consult the accompanying Kaggle notebooks for detailed code examples.

The above is the detailed content of Fine-tuning Llama 3.2 and Using It Locally: A Step-by-Step Guide. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

How to fix KB5055523 fails to install in Windows 11?

3 weeks ago By DDD

How to fix KB5055518 fails to install in Windows 10?

3 weeks ago By DDD

Roblox: Dead Rails - How To Tame Wolves

4 weeks ago By DDD

Strength Levels for Every Enemy & Monster in R.E.P.O.

4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Roblox: Grow A Garden - Complete Mutation Guide

2 weeks ago By DDD

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Java Tutorial

1662

CakePHP Tutorial

1418

Laravel Tutorial

1311

PHP Tutorial

1261

C# Tutorial

1234

Related knowledge

Getting Started With Meta Llama 3.2 - Analytics Vidhya Apr 11, 2025 pm 12:04 PM

Meta's Llama 3.2: A Leap Forward in Multimodal and Mobile AI Meta recently unveiled Llama 3.2, a significant advancement in AI featuring powerful vision capabilities and lightweight text models optimized for mobile devices. Building on the success o

10 Generative AI Coding Extensions in VS Code You Must Explore Apr 13, 2025 am 01:14 AM

Hey there, Coding ninja! What coding-related tasks do you have planned for the day? Before you dive further into this blog, I want you to think about all your coding-related woes—better list those down. Done? – Let&#8217

AV Bytes: Meta's Llama 3.2, Google's Gemini 1.5, and More Apr 11, 2025 pm 12:01 PM

This week's AI landscape: A whirlwind of advancements, ethical considerations, and regulatory debates. Major players like OpenAI, Google, Meta, and Microsoft have unleashed a torrent of updates, from groundbreaking new models to crucial shifts in le

Selling AI Strategy To Employees: Shopify CEO's Manifesto Apr 10, 2025 am 11:19 AM

Shopify CEO Tobi Lütke's recent memo boldly declares AI proficiency a fundamental expectation for every employee, marking a significant cultural shift within the company. This isn't a fleeting trend; it's a new operational paradigm integrated into p

A Comprehensive Guide to Vision Language Models (VLMs) Apr 12, 2025 am 11:58 AM

Introduction Imagine walking through an art gallery, surrounded by vivid paintings and sculptures. Now, what if you could ask each piece a question and get a meaningful answer? You might ask, “What story are you telling?

GPT-4o vs OpenAI o1: Is the New OpenAI Model Worth the Hype? Apr 13, 2025 am 10:18 AM

Introduction OpenAI has released its new model based on the much-anticipated “strawberry” architecture. This innovative model, known as o1, enhances reasoning capabilities, allowing it to think through problems mor

How to Add a Column in SQL? - Analytics Vidhya Apr 17, 2025 am 11:43 AM

SQL's ALTER TABLE Statement: Dynamically Adding Columns to Your Database In data management, SQL's adaptability is crucial. Need to adjust your database structure on the fly? The ALTER TABLE statement is your solution. This guide details adding colu

3 Methods to Run Llama 3.2 - Analytics Vidhya Apr 11, 2025 am 11:56 AM

Meta's Llama 3.2: A Multimodal AI Powerhouse Meta's latest multimodal model, Llama 3.2, represents a significant advancement in AI, boasting enhanced language comprehension, improved accuracy, and superior text generation capabilities. Its ability t

See all articles