DeepMind found that the prompt method of conveying 'take a deep breath and take one step at a time' to large models is extremely effective.-AI-php.cn

Home

DeepMind found that the prompt method of conveying 'take a deep breath and take one step at a time' to large models is extremely effective.

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Sep 13, 2023 pm 04:41 PM

large model theory optimizer

This article proposes a simple and effective method OPRO, which uses a large language model as an optimizer. The optimization task can be described in natural language, which is better than the prompts designed by humans.

Optimization is crucial in all fields.

#Some optimizations start with initialization and then iteratively update the solution to optimize the objective function. Such optimization algorithms often need to be customized for individual tasks to address the specific challenges posed by the decision space, especially for derivative-free optimization.

In the study we are going to introduce next, the researchers took a different approach. They used a large language model (LLM) to act as an optimizer and performed better than humans on various tasks. The design tips are okay.

This research comes from Google DeepMind, who proposed a simple and effective optimization method OPRO (Optimization by PROmpting), in which the optimization task can be described in natural language, For example, the prompt for LLM can be "Take a deep breath and solve this problem step by step", or it can be "Let's combine our numerical commands and clear thinking to decipher the answer quickly and accurately" and so on.

In each optimization step, LLM generates a new solution based on hints from previously generated solutions and their values, and then evaluates the new solution and add it to the tips for the next optimization step.

Finally, the study applies the OPRO method to linear regression and the traveling salesman problem (the famous NP problem), and then proceeds to prompt optimization, with the goal of finding the maximization task accurately Rate instructions.

This paper conducts a comprehensive evaluation of multiple LLMs, including text-bison and Palm 2-L in the PaLM-2 model family, and gpt- in the GPT model family. 3.5-turbo and gpt-4. The experiment optimized the prompts on GSM8K and Big-Bench Hard. The results show that the best prompt optimized by OPRO is 8% higher than the manually designed prompts on GSM8K and is higher than the manually designed prompts on the Big-Bench Hard task. Output up to 50%.

DeepMind found that the prompt method of conveying take a deep breath and take one step at a time to large models is extremely effective.

Paper address: https://arxiv.org/pdf/2309.03409.pdf

First paper, Google Chengrun Yang, a research scientist at DeepMind, said: “In order to perform prompt optimization, we start with basic instructions such as ‘Let’s start solving the problem’, or even empty strings. In the end, the instructions generated by OPRO will gradually improve LLM performance, as shown below The upward performance curve shown looks just like the situation in traditional optimization!"

DeepMind found that the prompt method of conveying take a deep breath and take one step at a time to large models is extremely effective.

"Every LLM is optimized by OPRO even if it starts from the same instruction. , the final optimization instructions of different LLMs also show different styles, are better than instructions written by humans, and can be transferred to similar tasks."

DeepMind found that the prompt method of conveying take a deep breath and take one step at a time to large models is extremely effective.

We can also conclude from the above table that the instruction styles finally found by LLM as an optimizer are very different. The instructions of PaLM 2-L-IT and text-bison are concise, while the instructions of GPT are long. And detailed. Although some top-level instructions contain "step-by-step" prompts, OPRO can find other semantic expressions and achieve comparable or better accuracy.

However, some researchers said that the prompt "take a deep breath and take it step by step" is very effective on Google's PaLM-2 (accuracy rate 80.2). But we can't guarantee that it works on all models and in all situations, so we shouldn't blindly use it everywhere.

DeepMind found that the prompt method of conveying take a deep breath and take one step at a time to large models is extremely effective.

OPRO: LLM as optimizer

Figure 2 shows the overall framework of OPRO. At each optimization step, LLM generates candidate solutions to the optimization task based on the optimization problem description and previously evaluated solutions in the meta-prompt (bottom right part of Figure 2).

Next, LLM evaluates the new solutions and adds them to meta-tips for the subsequent optimization process.

The optimization process is terminated when LLM is unable to propose a new solution with a better optimization score or when the maximum number of optimization steps is reached.

DeepMind found that the prompt method of conveying take a deep breath and take one step at a time to large models is extremely effective.

Figure 3 shows an example. Meta-hints contain two core contents, the first part is the previously generated hints and their corresponding training accuracy; the second part is the optimization problem description, including several randomly selected examples from the training set to exemplify the task of interest.

DeepMind found that the prompt method of conveying take a deep breath and take one step at a time to large models is extremely effective.

#This article first demonstrates the potential of LLM as a "mathematical optimization" optimizer. The results in the linear regression problem are shown in Table 2:

DeepMind found that the prompt method of conveying take a deep breath and take one step at a time to large models is extremely effective.

Next, the paper also explores the application of OPRO in the Traveling Salesman (TSP) ) problem, specifically, TSP means that given a set of n nodes and their coordinates, the TSP task is to find the shortest path starting from the starting node, traversing all nodes and finally returning to the starting node.

DeepMind found that the prompt method of conveying take a deep breath and take one step at a time to large models is extremely effective.

Experiment

In the experiment, this article uses the pre-trained PaLM 2-L, PaLM 2-L, text-bison, gpt-3.5-turbo, and gpt-4, which have been fine-tuned by instructions, are used as LLM optimizers; the pre-trained PaLM 2-L and text-bison are used as scorers LLM.

The evaluation benchmark GSM8K is about primary school mathematics, with 7473 training samples and 1319 test samples; the Big-Bench Hard (BBH) benchmark covers a wide range of topics beyond arithmetic reasoning , including symbolic manipulation and common sense reasoning.

GSM8K results

Figure 1 (a) shows the use of pre-trained PaLM 2-L as the scorer and PaLM 2-L-IT as the optimizer's instant optimization curve, it can be observed that the optimization curve shows an overall upward trend, with several jumps occurring throughout the optimization process:

DeepMind found that the prompt method of conveying take a deep breath and take one step at a time to large models is extremely effective.

Next, this article shows the results of using the text-bison scorer and the PaLM 2-L-IT optimizer to generate the Q_begin instruction. This article starts with an empty instruction. The training accuracy at this time is 57.1, and then the training Accuracy starts to increase. The optimization curve in Figure 4(a) shows a similar upward trend, during which there are some leaps in training accuracy:

DeepMind found that the prompt method of conveying take a deep breath and take one step at a time to large models is extremely effective.

BBH Result

Figure 5 visually shows the difference in accuracy of each task for all 23 BBH tasks compared with the "let's think step by step" instruction. Shows that OPRO finds instructions better than "let's think step by step". There is a big advantage on almost all tasks: the instructions found in this paper outperformed it by more than 5% on 19/23 tasks using the PaLM 2-L grader and on 15/23 tasks using the text-bison grader.

DeepMind found that the prompt method of conveying take a deep breath and take one step at a time to large models is extremely effective.

Similar to GSM8K, this paper observes that the optimization curves of almost all BBH tasks show an upward trend, as shown in Figure 6.

DeepMind found that the prompt method of conveying take a deep breath and take one step at a time to large models is extremely effective.

The above is the detailed content of DeepMind found that the prompt method of conveying 'take a deep breath and take one step at a time' to large models is extremely effective.. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Assassin's Creed Shadows: Seashell Riddle Solution

3 weeks ago By DDD

What's New in Windows 11 KB5054979 & How to Fix Update Issues

2 weeks ago By DDD

Where to find the Crane Control Keycard in Atomfall

3 weeks ago By DDD

Assassin's Creed Shadows - How To Find The Blacksmith And Unlock Weapon And Armour Customisation

1 months ago By DDD

Roblox: Dead Rails - How To Complete Every Challenge

3 weeks ago By DDD

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Where is the login entrance for gmail email?

7621

CakePHP Tutorial

1389

What is the format of the account name of steam

win11 activation key permanent

nyt connections hints and answers

136

Related knowledge

Big model app Tencent Yuanbao is online! Hunyuan is upgraded to create an all-round AI assistant that can be carried anywhere Jun 09, 2024 pm 10:38 PM

On May 30, Tencent announced a comprehensive upgrade of its Hunyuan model. The App "Tencent Yuanbao" based on the Hunyuan model was officially launched and can be downloaded from Apple and Android app stores. Compared with the Hunyuan applet version in the previous testing stage, Tencent Yuanbao provides core capabilities such as AI search, AI summary, and AI writing for work efficiency scenarios; for daily life scenarios, Yuanbao's gameplay is also richer and provides multiple features. AI application, and new gameplay methods such as creating personal agents are added. "Tencent does not strive to be the first to make large models." Liu Yuhong, vice president of Tencent Cloud and head of Tencent Hunyuan large model, said: "In the past year, we continued to promote the capabilities of Tencent Hunyuan large model. In the rich and massive Polish technology in business scenarios while gaining insights into users’ real needs

Bytedance Beanbao large model released, Volcano Engine full-stack AI service helps enterprises intelligently transform Jun 05, 2024 pm 07:59 PM

Tan Dai, President of Volcano Engine, said that companies that want to implement large models well face three key challenges: model effectiveness, inference costs, and implementation difficulty: they must have good basic large models as support to solve complex problems, and they must also have low-cost inference. Services allow large models to be widely used, and more tools, platforms and applications are needed to help companies implement scenarios. ——Tan Dai, President of Huoshan Engine 01. The large bean bag model makes its debut and is heavily used. Polishing the model effect is the most critical challenge for the implementation of AI. Tan Dai pointed out that only through extensive use can a good model be polished. Currently, the Doubao model processes 120 billion tokens of text and generates 30 million images every day. In order to help enterprises implement large-scale model scenarios, the beanbao large-scale model independently developed by ByteDance will be launched through the volcano

Breaking through the boundaries of traditional defect detection, 'Defect Spectrum' achieves ultra-high-precision and rich semantic industrial defect detection for the first time. Jul 26, 2024 pm 05:38 PM

In modern manufacturing, accurate defect detection is not only the key to ensuring product quality, but also the core of improving production efficiency. However, existing defect detection datasets often lack the accuracy and semantic richness required for practical applications, resulting in models unable to identify specific defect categories or locations. In order to solve this problem, a top research team composed of Hong Kong University of Science and Technology Guangzhou and Simou Technology innovatively developed the "DefectSpectrum" data set, which provides detailed and semantically rich large-scale annotation of industrial defects. As shown in Table 1, compared with other industrial data sets, the "DefectSpectrum" data set provides the most defect annotations (5438 defect samples) and the most detailed defect classification (125 defect categories

NVIDIA dialogue model ChatQA has evolved to version 2.0, with the context length mentioned at 128K Jul 26, 2024 am 08:40 AM

The open LLM community is an era when a hundred flowers bloom and compete. You can see Llama-3-70B-Instruct, QWen2-72B-Instruct, Nemotron-4-340B-Instruct, Mixtral-8x22BInstruct-v0.1 and many other excellent performers. Model. However, compared with proprietary large models represented by GPT-4-Turbo, open models still have significant gaps in many fields. In addition to general models, some open models that specialize in key areas have been developed, such as DeepSeek-Coder-V2 for programming and mathematics, and InternVL for visual-language tasks.

Training with millions of crystal data to solve the crystallographic phase problem, the deep learning method PhAI is published in Science Aug 08, 2024 pm 09:22 PM

Editor |KX To this day, the structural detail and precision determined by crystallography, from simple metals to large membrane proteins, are unmatched by any other method. However, the biggest challenge, the so-called phase problem, remains retrieving phase information from experimentally determined amplitudes. Researchers at the University of Copenhagen in Denmark have developed a deep learning method called PhAI to solve crystal phase problems. A deep learning neural network trained using millions of artificial crystal structures and their corresponding synthetic diffraction data can generate accurate electron density maps. The study shows that this deep learning-based ab initio structural solution method can solve the phase problem at a resolution of only 2 Angstroms, which is equivalent to only 10% to 20% of the data available at atomic resolution, while traditional ab initio Calculation

Google AI won the IMO Mathematical Olympiad silver medal, the mathematical reasoning model AlphaProof was launched, and reinforcement learning is so back Jul 26, 2024 pm 02:40 PM

For AI, Mathematical Olympiad is no longer a problem. On Thursday, Google DeepMind's artificial intelligence completed a feat: using AI to solve the real question of this year's International Mathematical Olympiad IMO, and it was just one step away from winning the gold medal. The IMO competition that just ended last week had six questions involving algebra, combinatorics, geometry and number theory. The hybrid AI system proposed by Google got four questions right and scored 28 points, reaching the silver medal level. Earlier this month, UCLA tenured professor Terence Tao had just promoted the AI Mathematical Olympiad (AIMO Progress Award) with a million-dollar prize. Unexpectedly, the level of AI problem solving had improved to this level before July. Do the questions simultaneously on IMO. The most difficult thing to do correctly is IMO, which has the longest history, the largest scale, and the most negative

Advanced practice of industrial knowledge graph Jun 13, 2024 am 11:59 AM

1. Background Introduction First, let’s introduce the development history of Yunwen Technology. Yunwen Technology Company...2023 is the period when large models are prevalent. Many companies believe that the importance of graphs has been greatly reduced after large models, and the preset information systems studied previously are no longer important. However, with the promotion of RAG and the prevalence of data governance, we have found that more efficient data governance and high-quality data are important prerequisites for improving the effectiveness of privatized large models. Therefore, more and more companies are beginning to pay attention to knowledge construction related content. This also promotes the construction and processing of knowledge to a higher level, where there are many techniques and methods that can be explored. It can be seen that the emergence of a new technology does not necessarily defeat all old technologies. It is also possible that the new technology and the old technology will be integrated with each other.

Nature's point of view: The testing of artificial intelligence in medicine is in chaos. What should be done? Aug 22, 2024 pm 04:37 PM

Editor | ScienceAI Based on limited clinical data, hundreds of medical algorithms have been approved. Scientists are debating who should test the tools and how best to do so. Devin Singh witnessed a pediatric patient in the emergency room suffer cardiac arrest while waiting for treatment for a long time, which prompted him to explore the application of AI to shorten wait times. Using triage data from SickKids emergency rooms, Singh and colleagues built a series of AI models that provide potential diagnoses and recommend tests. One study showed that these models can speed up doctor visits by 22.3%, speeding up the processing of results by nearly 3 hours per patient requiring a medical test. However, the success of artificial intelligence algorithms in research only verifies this

See all articles