The emergence of ChatGPT may be the most eye-catching AI breakthrough in the second half of 2022, although it may not be the most technical.
Not long ago, at the 2022 NeurIPS held in New Orleans, rumors about GPT-4 were endless. At the same time, OpenAI also became the focus of the news media. focus.
OpenAI announced a new model in the GPT-3 series of AI large-scale language models: text-davinci-003, which is its "GPT-3.5 series" part that can improve performance by processing more complex instructions and producing higher quality, longer-form content.
The new model is built on InstructGPT and uses reinforcement learning with human feedback to make the language model more compatible with humans Instructions are better aligned.
DaVinci-003 is a true reinforcement learning with human feedback (RLHF) model, which performs well in human demonstrations and high-scoring model samples Use supervised fine-tuning to improve generation quality. "
As another part of the "GPT-3.5 series", OpenAI released an early demonstration of ChatGPT. The company claimed that this interactive conversation model, Not only can you answer a large number of follow-up questions, but you can also admit mistakes, challenge incorrect premises, and reject inappropriate requests.
OpenAI said in a blog post that ChatGPT’s research release is “OpenAI’s iterative deployment of increasingly safe and useful AI.” The latest step in the system. It incorporates many lessons learned from earlier model deployments such as GPT-3 and Codex, resulting in a significant reduction in harmful and unrealistic output results when leveraging reinforcement learning with human feedback (RLHF).
In addition, ChatGPT emphasized during training that it is a machine learning model. This may be to avoid the "whether AI is conscious" caused by Google's chat robot LaMDA not long ago. dispute.
# Of course, ChatGPT also has limitations.
In a blog post, OpenAI details its limitations, including that sometimes answers may sound reasonable but are actually incorrect or nonsensical. Meaningful facts.
"Solving this problem is very challenging because (1) there is currently no guarantee of reliable sources during reinforcement learning training; (2) ) trained models are more cautious and will reject questions that may be answered correctly; (3) supervised training can mislead the model because the ideal answer depends on what the model knows, not what the human demonstrator knows."
Open AI said that ChatGPT "sometimes reacts to harmful instructions or exhibits biased behavior. We are using the API to warn or block certain types of unsafe content, but Expect there to be some false negatives and positives at the moment. We are very interested in collecting user feedback to help our ongoing work to improve this model."
Although ChatGPT may still have many problems that need improvement, we cannot deny that before GPT-4 comes on the scene, ChatGPT is still the top of the current large language models. flow.
However, in the recent community, there is a new model that has ignited everyone’s enthusiasm for discussion. The most important thing is that it is open source.
This week, Philip Wang, a developer responsible for reverse engineering closed-source AI systems including Meta’s Make-A-Video, released PaLM RLHF, which is A text generation model that behaves like ChatGPT.
Code address: https://github.com/lucidrains/PaLM-rlhf-pytorch
The system combines Google’s large-scale language model PaLM and reinforcement learning with human feedback (RLHF) technology to create a system that can complete almost any task in ChatGPT, including drafting emails and suggesting computer code.
Since its release, ChatGPT has been able to generate high-definition human-like text. And its ability to respond to user questions in a conversational manner has taken the tech world by storm.
Although this is a major advancement in the early stages of chatbot development, many fans in the field of artificial intelligence have expressed concerns about the closed nature of ChatGPT.
To this day, the ChatGPT model remains proprietary, meaning its underlying code cannot be viewed by the public. Only OpenAI really knows how it works and what data it processes. This lack of transparency can have far-reaching consequences and could affect user trust in the long term.
# Many developers have been eager to build an open source alternative, and now, it's finally here. PaLM RLHF is built specifically for the Python language and can be implemented for PyTorch.
Developers can train PaLM as easily as an autoregressive transformer and then use human feedback to train the reward model.
Like ChatGPT, PaLM RLHF is essentially a statistical tool for predicting words. When fed a large number of examples from the training data—such as posts from Reddit, news articles, and e-books—PaLM RLHF learns how likely words are to occur based on patterns such as the semantic context of the surrounding text.
Of course, there is still a big gap between ideal and reality. PaLM RLHF seems perfect, but it also has various problems. The biggest problem is that people can't use it yet.
# To launch PaLM RLHF, users need to compile gigabytes of text obtained from various sources such as blogs, social media, news articles, e-books, and more.
This data is fed to a fine-tuned PaLm model, which generates several responses. For example, if you ask the model "What are the basic knowledge of economics?", PaLm will give answers such as "Economics is the social science that studies...".
#Afterwards, the developers will ask people to rank the answers generated by the model from best to worst and create a reward model. Finally, the rankings are used to train a “reward model” that takes the original model’s responses and sorts them in order of preference, filtering out the best answers for a given prompt.
# However, this is an expensive process. Collecting training data and training the model itself is not cheap. PaLM has 540 billion parameters, which is what the language model learns from the training data. A 2020 study showed that developing a text generation model with only 1.5 billion parameters would cost up to $1.6 million.
In July this year, in order to train the open source model Bloom with 176 billion parameters, Hugging Face researchers spent three months and used 384 NVIDIA A100 GPU. Each A100 costs thousands of dollars, which is not a cost that any average user can afford.
Additionally, even after training the model, running a model of the size of PaLM RLHF is not trivial. Bloom has a dedicated PC with eight A100 GPUs, and OpenAI's text generation GPT-3 (which has about 175 billion parameters) costs about $87,000 per year to run.
Scaling up the necessary development workflow can also be a challenge, AI researcher Sebastian Raschka noted in an article about PaLM RLHF.
"Even if someone gives you 500 GPUs to train this model, you still need to deal with the infrastructure and have a software framework that can handle it," he said. "Although this is feasible, it currently requires a lot of effort."
The high cost and huge scale indicate that without well-funded companies or individuals taking the trouble to train the model, PaLM RLHF currently does not have the ability to replace ChatGPT.
#So far, there is no exact release date for PaLM RLHF. For reference, it took Hugging Face three months to train Bloom. In contrast, PaLM RLHF, which has 540 billion parameters, may need to wait 6-8 months to produce a meaningful version.
The good news is that so far we have three known players working on this open source alternative to ChatGPT:
CarperAI plans to release the first ready-to-run, ChatGPT-like app in partnership with EleutherAI and startups Scale AI and Hugging Face An AI model trained with human feedback.
## Code address: https://github.com/CarperAI/trlx
LAION, the non-profit organization that provided the initial dataset for Stable Diffusion, is also spearheading a project to replicate ChatGPT using the latest machine learning technology.
## Code address: https://github.com/LAION-AI/Open-Assistant
LAION aims to create a "future assistant" that can not only write emails and cover letters, but also "do meaningful work, use APIs, dynamically research information, etc." It's in its early stages, but a project with related resources went live on GitHub a few weeks ago.
And GPT-4chan, created by YouTube celebrity and AI researcher Yannic Kilcher, is more like a bad-mouth expert who "comes out of the mud and is completely stained" .
The "4chan" in this model is an American online anonymous forum. Because the identities of netizens are anonymous, many people are fearless and express various political opinions. Incorrect remarks. Kilcher officially used posts on 4chan to train the model, and the results are predictable.
#Similar to the general tone of the forum, GPT-4chan’s answers were filled with racism, sexism, and anti-Semitism. Not only that, Kilcher also posted its underlying model to Hugging Face for others to download. However, under the condemnation of many AI researchers, officials quickly restricted netizens’ access to the model.
#While we look forward to the emergence of more open source language models, all we can do now is wait. Of course, it's also a good idea to continue using ChatGPT for free.
#It is worth noting that OpenAI is still far ahead in development before any open source version is officially launched. In 2023, GPT-4 is undoubtedly what AI enthusiasts around the world are looking forward to. Countless AI giants have made their own predictions about it. These predictions are good or bad, but as OpenAI COO Sam Altman said: "The completion of general artificial intelligence will be faster than Most people imagine faster, and it changes everything most people imagine."
The above is the detailed content of On the first day of 2023, please check ChatGPT's year-end summary!. For more information, please follow other related articles on the PHP Chinese website!