Table of Contents
affordable models , powerful capabilities
Complete ChatGPT cloning solution
系统性能优化与开发加速
ColossalChat和Alpaca的区别" >ColossalChat和Alpaca的区别
Home Technology peripherals AI The 0-threshold cloning solution has been upgraded, the open source model is completely reproduced, and no registration is required for online experience.

The 0-threshold cloning solution has been upgraded, the open source model is completely reproduced, and no registration is required for online experience.

Apr 14, 2023 pm 10:58 PM
ai plan

AI applications and large models represented by ChatGPT and GPT4 are popular around the world and are regarded as opening up a new technological industrial revolution and a new starting point for AGI (Artificial General Intelligence). Not only are technology giants chasing each other and vying to launch new products, but many AI tycoons in academia and industry are also investing in related entrepreneurial waves. Generative AI is rapidly iterating in "days" and continues to surge!

However, OpenAI has not made it open source. What are the technical details behind them? How to quickly follow, catch up and participate in this technology wave? How to reduce the high cost of building and applying large AI models? How to protect core data and intellectual property from being leaked due to the use of third-party large model APIs?

As the most popular open source AI large model solution, Colossal-AI is the first to establish a model that includes supervised data set collection -> supervised fine-tuning -> reward model Training -> The complete RLHF process of reinforcement learning fine-tuning , based on LLaMA pre-training model, launched ColossalChat, is currently the practical open source project closest to the original technical solution of ChatGPT!

Open source address: ​https://github.com/hpcaitech/ColossalAI​

Contains the following content:

1. Demo: You can directly experience the model effect online without registration or waitinglist

2. Training code: Open source complete RLHF training code, which has been open sourced to include 7B and 13B models

3. Dataset: Open source 104K Chinese and English bilingual data set

4. Inference deployment: 4bit quantitative inference 7 billion parameter model only requires 4GB of video memory

5. Model weight: Only a single machine The server can quickly reproduce with a small amount of computing power

6. Larger-scale models, data sets, other optimizations, etc. will maintain high-speed iteration to add

affordable models , powerful capabilities

ColossalChat only needs less than 10 billion parameters, and performs RLHF fine-tuning on the basis of a large language model to master Chinese and English bilingual capabilities, reaching a level similar to ChatGPT and GPT-3.5 Effect.

For example, common sense question and answer:

The 0-threshold cloning solution has been upgraded, the open source model is completely reproduced, and no registration is required for online experience.

Chinese answer:

The 0-threshold cloning solution has been upgraded, the open source model is completely reproduced, and no registration is required for online experience.

Write an email:

The 0-threshold cloning solution has been upgraded, the open source model is completely reproduced, and no registration is required for online experience.

Write an algorithm:

The 0-threshold cloning solution has been upgraded, the open source model is completely reproduced, and no registration is required for online experience.

Complete ChatGPT cloning solution

Although GPT series models such as ChatGPT and GPT-4 are very powerful, they are unlikely to be fully open source. Fortunately, the open source community continues to work hard.

For example, Meta has open sourced the LLaMA model. The number of parameters of this model ranges from 7 billion to 65 billion. 13 billion parameters can outperform the 175 billion GPT-3 model in most cases. Benchmark performance. However, because it was not instructed to fine-tune (instruct tuning), the actual generation effect was not ideal.

Stanford's Alpaca generates training data in a self-instruct manner by calling the OpenAI API, so that a lightweight model with only 7 billion parameters can be fine-tuned at very low cost. The dialogue effect is comparable to that of ultra-large-scale language models with hundreds of billions of parameters like GPT-3.5.

ButThe existing open source solutions can be regarded as supervised fine-tuning models that only get the first step in reinforcement learning with human feedback (RLHF), no subsequent alignment and fine-tuning work has been performed. At the same time, Alpaca’s training data set is too small and the corpus is only in English, which also limits the performance of the model to a certain extent.

The amazing effect of ChatGPT and GPT-4 lies in the introduction of RLHF into the training process, making the generated content more consistent with human values.

The 0-threshold cloning solution has been upgraded, the open source model is completely reproduced, and no registration is required for online experience.

##The three stages of RLHF

Based on the LLaMA model, Colossal-AI is the first open source Chat-like model reproduction solution ColossalChat that includes a complete RLHF process. It is currently closest to the original technical route of ChatGPT's practical open source project!

Open source of training data set

ColossalChat has open sourced a Chinese and English bilingual data set containing about 100,000 questions and answers. This data set collects and cleans real questioning scenarios of people on social platforms as a seed data set, uses self-instruct technology to expand the data, and costs about $900 for annotation. Compared with the data sets generated by other self-instruct methods, the seed data of this data set is more real and rich, and the generated data set covers more topics. This data can be used for both fine-tuning and RLHF training. Through high-quality data, ColossalChat can conduct better conversational interactions and support Chinese.

The 0-threshold cloning solution has been upgraded, the open source model is completely reproduced, and no registration is required for online experience.

##ColossalChat data set collection process

RLHF algorithm reproduction

RLHF-Stage1 is supervised-fintuning, that is, using the data set mentioned above for model fine-tuning.

RLHF-Stage2 trained the reward model. It manually sorted different outputs of the same prompt to obtain the corresponding scores and supervised the training of the reward model.

RLHF-Stage3 uses a reinforcement learning algorithm, which is the most complex part of the training process:

The 0-threshold cloning solution has been upgraded, the open source model is completely reproduced, and no registration is required for online experience.

RLHF-Stage3 algorithm flow chart

In the PPO part, ColossalChat is divided into two stages: the first is the Make Experience part, using The SFT, Actor, RM, and Critic model calculations generate Experience and store it in the buffer; followed by the parameter update part, the Experience is used to calculate the strategy loss and value loss.

In the PTX part, ColossalChat calculates the cross-entropy loss function of the Actor output response and the answer part of the input corpus, which is used to add the pre-training gradient to the PPO gradient to maintain the original language model Performance prevents forgetting. Finally, the strategy loss, value loss and PTX loss are summed for back propagation and parameter update.

Get started quickly

ColossalChat has open sourced the complete code for reproducing the three stages of training ChatGPT based on the LLaMA model.

The first stage, train the SFT model:

# Training with a 4-GPU servers
colossalai run --nproc_per_node=4 train_sft.py 
--pretrain "/path/to/LLaMa-7B/" 
--model 'llama' 
--strategy colossalai_zero2 
--log_interval 10 
--save_path/path/to/Coati-7B 
--dataset /path/to/data.json 
--batch_size 4 
--accimulation_steps 8 
--lr 2e-5
Copy after login

The second stage, train the reward model:

# Training with a 4-GPU servers
colossalai run --nproc_per_node=4 train_reward_model.py 
--pretrain "/path/to/LLaMa-7B/" 
--model 'llama' 
--strategy colossalai_zero2 
--dataset /path/to/datasets
Copy after login

The third stage, using RL training:

# Training with a 8-GPU servers
colossalai run --nproc_per_node=8 train_prompts.py prompts.csv 
--strategy colossalai_zero2 
--pretrain "/path/to/Coati-7B" 
--model 'llama' 
--pretrain_dataset /path/to/dataset
Copy after login

After obtaining the final model weights, you can also reduce the cost of inference hardware through quantification and start the online inference service with just a single A GPU with approximately 4GB of video memory can complete the deployment of the 7 billion parameter model inference service.

python server.py/path/to/pretrained --quant 4bit --gptq_checkpoint /path/to/coati-7b-4bit-128g.pt --gptq_group_size 128
Copy after login

系统性能优化与开发加速

ColossalChat 能够快速跟进 ChatGPT 完整 RLHF 流程复现,离不开 AI 大模型基础设施 Colossal-AI 及相关优化技术的底座支持,相同条件下训练速度相比 Alpaca 采用的 FSDP (Fully Sharded Data Parallel) 可提升三倍左右

系统基础设施 Colossal-AI

AI 大模型开发系统 Colossal-AI 为该方案提供了基础支持,它可基于 PyTorch 高效快速部署 AI 大模型训练和推理,从而降低 AI 大模型应用的成本。Colossal-AI 由加州伯克利大学杰出教授 James Demmel 和新加坡国立大学校长青年教授尤洋领导开发。自从它开源以来,Colossal-AI 已经多次在 GitHub 热榜位列世界第一,获得 GitHub Star 约两万颗,并成功入选 SC、AAAI、PPoPP、CVPR、ISC 等国际 AI 与 HPC 顶级会议的官方教程。

减少内存冗余的 ZeRO + Gemini

Colossal-AI 支持使用无冗余优化器 (ZeRO) 提高内存使用效率,低成本容纳更大模型,同时不影响计算粒度和通信效率。自动 Chunk 机制可以进一步提升 ZeRO 的性能,提高内存使用效率,减少通信次数并避免内存碎片。异构内存空间管理器 Gemini 支持将优化器状态从 GPU 显存卸载到 CPU 内存或硬盘空间,以突破 GPU 显存容量限制,扩展可训练模型的规模,降低 AI 大模型应用成本。

使用 LoRA 低成本微调

Colossal-AI 支持使用低秩矩阵微调(LoRA)方法,对 AI 大模型进行低成本微调。LoRA 方法认为大语言模型是过参数化的,而在微调时,参数改变量是一个低秩矩阵。因此,可以将这个矩阵分解为两个更小的矩阵的乘积。在微调过程中,大模型的参数被固定,只有低秩矩阵参数被调整,从而显著减小了训练所需的参数量,并降低成本。

低成本量化推理

The 0-threshold cloning solution has been upgraded, the open source model is completely reproduced, and no registration is required for online experience.

GPTQ 量化

为降低推理部署成本,Colossal-AI 使用 GPTQ 4bit 量化推理。在 GPT/OPT/BLOOM 类模型上,它比传统的 RTN (rount-to-nearest) 量化技术能够获得更好的 Perplexity 效果。相比常见的 FP16 推理,它可将显存消耗降低 75%,只损失极少量的吞吐速度与 Perplexity 性能。

以 ColossalChat-7B 为例,在使用 4bit 量化推理时,70 亿参数模型仅需大约 4GB 显存即可完成短序列(生成长度为 128 )推理,在普通消费级显卡上即可完成(例如 RTX 3060 Laptop),仅需一行代码即可使用。

if args.quant == '4bit':
model = load_quant (args.pretrained, args.gptq_checkpoint, 4, args.gptq_group_size)
Copy after login

如果采用高效的异步卸载技术 (offload),还可以进一步降低显存要求,使用更低成本的硬件推理更大的模型。

ColossalChat和Alpaca的区别

1. ColossalChat 开源了第一个完整的RLHF pipeline,斯坦福Alpaca没有做 RLHF,也就是没有做 Stage 2 和 Stage 3。

2. ColossalChat 采用了更多的指令数据,质量更好,范围更大,并使用强化学习做alignment 使回答更接近人类。

3. The ColossalChat training process integrates many system optimizations of Colossal-AI, and the training speed of the same data set and model size can be faster than Alpaca3 About times , allowing scientific researchers and small and medium-sized enterprises to independently train and deploy their own conversational systems.

4. The ColossalChat team collected more data sets themselves: a total of 24M tokens in English for training, about 30M tokens in Chinese, and a total of about 54M tokens. Among them, the data set collected by ColossalChat itself is 6M in English and 18M tokens in Chinese.

The following are some performances of ColossalChat and Alpaca in language dialogue (ColossalChat above, Alpaca below).

Write Quicksort in Python:

The 0-threshold cloning solution has been upgraded, the open source model is completely reproduced, and no registration is required for online experience.

## Write an email to the professor to request a letter of recommendation:

The 0-threshold cloning solution has been upgraded, the open source model is completely reproduced, and no registration is required for online experience.

Open collaboration

Although RLHF has been further introduced, due to the computing power Since the data set is limited, there is still room for improvement in actual performance in some scenarios.

The 0-threshold cloning solution has been upgraded, the open source model is completely reproduced, and no registration is required for online experience.

Fortunately, unlike in the past, large AI models and cutting-edge technologies were only monopolized by a few technology giants. Open source communities such as PyTorch, Hugging Face and OpenAI are closely related to Start-ups also play a key role in this wave. Drawing on the successful experience of the open source community, Colossal-AI welcomes all parties to participate in co-construction and embrace the era of large models!

You can contact or participate through the following methods:

1. Post an issue on GitHub or submit a pull request (PR)

2. Join the Colossal-AI user WeChat or Slack group to communicate

3. Send a formal cooperation proposal to the email youy@comp.nus.edu.sg

Open source address:

​https://github.com/hpcaitech/ColossalAI​

The above is the detailed content of The 0-threshold cloning solution has been upgraded, the open source model is completely reproduced, and no registration is required for online experience.. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

Repo: How To Revive Teammates
1 months ago By 尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
2 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
Hello Kitty Island Adventure: How To Get Giant Seeds
1 months ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

What are the types of return values ​​of c language function? Summary of types of return values ​​of c language function? What are the types of return values ​​of c language function? Summary of types of return values ​​of c language function? Apr 03, 2025 pm 11:18 PM

The return value types of C language function include int, float, double, char, void and pointer types. int is used to return integers, float and double are used to return floats, and char returns characters. void means that the function does not return any value. The pointer type returns the memory address, be careful to avoid memory leakage.结构体或联合体可返回多个相关数据。

C language starts from 0 C language starts from 0 Apr 03, 2025 pm 08:24 PM

It may be a bit difficult to get started with C language learning, but after mastering the correct method, you will quickly master the basics and gradually master them. This guide will guide you step by step to learn the core concepts of C language, from basics to advanced topics. Directory C language basics and data types User input conditional expression abbreviation switch statement C language array nested loop C language function structure pointer C language basics and data types C programs follow standard structures and use multiple data types to define variables. The basic program structure is as follows: #includeintmain(){printf("hello,world!");ret

Concept of c language function Concept of c language function Apr 03, 2025 pm 10:09 PM

C language functions are reusable code blocks. They receive input, perform operations, and return results, which modularly improves reusability and reduces complexity. The internal mechanism of the function includes parameter passing, function execution, and return values. The entire process involves optimization such as function inline. A good function is written following the principle of single responsibility, small number of parameters, naming specifications, and error handling. Pointers combined with functions can achieve more powerful functions, such as modifying external variable values. Function pointers pass functions as parameters or store addresses, and are used to implement dynamic calls to functions. Understanding function features and techniques is the key to writing efficient, maintainable, and easy to understand C programs.

C Programmer &#s Undefined Behavior Guide C Programmer &#s Undefined Behavior Guide Apr 03, 2025 pm 07:57 PM

Exploring Undefined Behaviors in C Programming: A Detailed Guide This article introduces an e-book on Undefined Behaviors in C Programming, a total of 12 chapters covering some of the most difficult and lesser-known aspects of C Programming. This book is not an introductory textbook for C language, but is aimed at readers familiar with C language programming, and explores in-depth various situations and potential consequences of undefined behaviors. Author DmitrySviridkin, editor Andrey Karpov. After six months of careful preparation, this e-book finally met with readers. Printed versions will also be launched in the future. This book was originally planned to include 11 chapters, but during the creation process, the content was continuously enriched and finally expanded to 12 chapters - this itself is a classic array out-of-bounds case, and it can be said to be every C programmer

How to calculate c-subscript 3 subscript 5 c-subscript 3 subscript 5 algorithm tutorial How to calculate c-subscript 3 subscript 5 c-subscript 3 subscript 5 algorithm tutorial Apr 03, 2025 pm 10:33 PM

The calculation of C35 is essentially combinatorial mathematics, representing the number of combinations selected from 3 of 5 elements. The calculation formula is C53 = 5! / (3! * 2!), which can be directly calculated by loops to improve efficiency and avoid overflow. In addition, understanding the nature of combinations and mastering efficient calculation methods is crucial to solving many problems in the fields of probability statistics, cryptography, algorithm design, etc.

Unique shared library issues Unique shared library issues Apr 03, 2025 pm 08:00 PM

Problem Description Recently, I encountered a link error when I tried to link a self-built C language shared library to a local project, and I encountered a link error, prompting "Undefined reference". The error message is as follows: /bin/ld:/tmp/cchb7mj8.o:infunction`sdl_main':main.c:(.text 0x3c):undefinedreferenceto`sdl_enterappmaincallbacks'...(other similar undefined references)..collect2:error:ldreturned1exitstatusmake:***[

Exercise C: Building a simple phonebook application Exercise C: Building a simple phonebook application Apr 03, 2025 pm 08:15 PM

One of the best ways to learn C language programming is to practice it. This article will take you step through a project I recently completed: a simple phonebook application. This app demonstrates file processing and basic data management in C, allowing you to add, view, and delete contacts. The following is the complete code: #include#include//Function declaration voidaddcontact(charname[],charnumber[]);voidviewcontacts();voiddeletecontact(c

Object-oriented in C? Implementing interfaces from scratch Object-oriented in C? Implementing interfaces from scratch Apr 03, 2025 pm 08:21 PM

This article discusses how to simulate the concept of interfaces in object-oriented programming in C language. We will take the calculation of vehicle prices as an example, implement them in Java and C languages ​​respectively, compare the differences between the two languages, and show how to implement the basic functions of the interface in C. Java implementation: In Java, the interface is defined using the interface keyword, and the class implements the interface through the implements keyword. The sample code is as follows: interfaceVehicle{intprice();}classCarimplementsVehicle{privatefinalintspeed;publi

See all articles