


Quickly learn the key technical points of the InstructGPT paper: follow Li Mu to master the technology behind ChatGPT
After ChatGPT became popular, many students who pay attention to technology are asking a question: Are there any learning materials that can allow us to systematically understand the principles behind ChatGPT? This problem becomes tricky because OpenAI has not released a paper related to ChatGPT.
However, we know from OpenAI’s blog about ChatGPT that the method used by ChatGPT is the same as its brother model-InstructGPT, except that InstructGPT is fine-tuned on GPT-3 , while ChatGPT is based on GPT-3.5. There are also some differences between the two in terms of data collection.
Blog link: https://openai.com/blog/chatgpt/
The InstructGPT paper was released in March 2022, but OpenAI published a related blog as early as January (see "What to do with GPT-3 nonsense? OpenAI: We re-trained it" , the new version is more "obedient"). At that time, OpenAI clearly mentioned that InstructGPT used the reinforcement learning method of human feedback (RLHF) to fine-tune GPT-3, making the output of the model more consistent with human preferences. This has been continued in the training of ChatGPT.
Paper link: https://arxiv.org/pdf/2203.02155.pdf
In addition, there are many similarities between InstructGPT and ChatGPT. Therefore, a thorough understanding of the InstructGPT paper will be of great benefit to students who want to do some work in the direction of ChatGPT. This is why we highly recommend Li Mu’s lectures.
Course address: https://jmq.xet.tech/s/2lec6b (Click "Read Original text" can be accessed directly)
Dr. Li Mu is the senior chief scientist of Amazon. He previously co-authored "Hands-on Deep Learning" with Aston Zhang and others. In the past two years, he has been introducing various AI knowledge to everyone through videos and produced intensive reading courses on dozens of papers. Many students have developed the habit of reading papers intensively with Li Mu.
Dr. Li Mu’s account on Station B is “Learn AI from Li Mu”.
This interpretation course for InstructGPT lasts 67 minutes in total and is basically introduced in the order of writing the paper.
Students who have read the ChatGPT blog know that its technical principles can basically be summarized in one picture. This It is also a picture that has already appeared in the InstructGPT paper (there are subtle differences between the two). When interpreting the abstract and introduction of the paper, Li Mu introduced the three steps in the diagram in detail.
Technical schematics from the ChatGPT blog.
InstructGPT Technical schematic from the paper.
In the third chapter of the paper, the authors of InstructGPT first introduced their data acquisition method and process, and Li Mu also took everyone to read it in detail. This part is very valuable in engineering. As Li Mu said, if you have never done anything like this before (data labeling, etc.) and need to find someone to help you label data, then you can look at its appendix, which contains many templates that can be used directly. The author of the paper It even describes what the UI of their annotated website looks like, which is worth learning from.
Next, Li Mu focused on the three models written in Chapter 3 (see 3.5 Models) - SFT (Supervised fine-tuning) model, RM (Reward modeling) model and RL (Reinforcement learning) ) models, including details such as parameters and objective functions involved in these models.
Finally, Li Mu concluded that technically speaking, InstructGPT is still a very practical technology. It tells everyone a method: given a large language model, how can you quickly improve its performance in a field you care about through some annotated data to make it practical. Therefore, it provides an operational idea for people who want to use generative models to make products.
Of course, as Dr. Li Mu said, scientific research work is step-by-step, and InstructGPT is also based on previous research, so students who want to thoroughly understand ChatGPT will inevitably have to go back and read it. More papers. In previous courses, Li Mu also interpreted the papers of GPT, GPT-2, and GPT-3 in detail:
## Course address: https://jmq.xet.tech/s/2lec6b
The above is the detailed content of Quickly learn the key technical points of the InstructGPT paper: follow Li Mu to master the technology behind ChatGPT. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics



DALL-E 3 was officially introduced in September of 2023 as a vastly improved model than its predecessor. It is considered one of the best AI image generators to date, capable of creating images with intricate detail. However, at launch, it was exclus

StableDiffusion3’s paper is finally here! This model was released two weeks ago and uses the same DiT (DiffusionTransformer) architecture as Sora. It caused quite a stir once it was released. Compared with the previous version, the quality of the images generated by StableDiffusion3 has been significantly improved. It now supports multi-theme prompts, and the text writing effect has also been improved, and garbled characters no longer appear. StabilityAI pointed out that StableDiffusion3 is a series of models with parameter sizes ranging from 800M to 8B. This parameter range means that the model can be run directly on many portable devices, significantly reducing the use of AI

The perfect combination of ChatGPT and Python: Creating an Intelligent Customer Service Chatbot Introduction: In today’s information age, intelligent customer service systems have become an important communication tool between enterprises and customers. In order to provide a better customer service experience, many companies have begun to turn to chatbots to complete tasks such as customer consultation and question answering. In this article, we will introduce how to use OpenAI’s powerful model ChatGPT and Python language to create an intelligent customer service chatbot to improve

Installation steps: 1. Download the ChatGTP software from the ChatGTP official website or mobile store; 2. After opening it, in the settings interface, select the language as Chinese; 3. In the game interface, select human-machine game and set the Chinese spectrum; 4 . After starting, enter commands in the chat window to interact with the software.

In this article, we will introduce how to develop intelligent chatbots using ChatGPT and Java, and provide some specific code examples. ChatGPT is the latest version of the Generative Pre-training Transformer developed by OpenAI, a neural network-based artificial intelligence technology that can understand natural language and generate human-like text. Using ChatGPT we can easily create adaptive chats

Since Neural Radiance Fields was proposed in 2020, the number of related papers has increased exponentially. It has not only become an important branch of three-dimensional reconstruction, but has also gradually become active at the research frontier as an important tool for autonomous driving. NeRF has suddenly emerged in the past two years, mainly because it skips the feature point extraction and matching, epipolar geometry and triangulation, PnP plus Bundle Adjustment and other steps of the traditional CV reconstruction pipeline, and even skips mesh reconstruction, mapping and light tracing, directly from 2D The input image is used to learn a radiation field, and then a rendered image that approximates a real photo is output from the radiation field. In other words, let an implicit three-dimensional model based on a neural network fit the specified perspective

chatgpt can be used in China, but cannot be registered, nor in Hong Kong and Macao. If users want to register, they can use a foreign mobile phone number to register. Note that during the registration process, the network environment must be switched to a foreign IP.

Since it was first held in 2017, CoRL has become one of the world's top academic conferences in the intersection of robotics and machine learning. CoRL is a single-theme conference for robot learning research, covering multiple topics such as robotics, machine learning and control, including theory and application. The 2023 CoRL Conference will be held in Atlanta, USA, from November 6th to 9th. According to official data, 199 papers from 25 countries were selected for CoRL this year. Popular topics include operations, reinforcement learning, and more. Although CoRL is smaller in scale than large AI academic conferences such as AAAI and CVPR, as the popularity of concepts such as large models, embodied intelligence, and humanoid robots increases this year, relevant research worthy of attention will also
