Can't wait for OpenAI's Q*, Huawei Noah's secret weapon MindStar to explore LLM reasoning is here first-AI-php.cn

Home

Technology peripherals

Can't wait for OpenAI's Q*, Huawei Noah's secret weapon MindStar to explore LLM reasoning is here first

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Jul 02, 2024 am 05:01 AM

project MindStar

Cant wait for OpenAIs Q*, Huawei Noahs secret weapon MindStar to explore LLM reasoning is here first

The AIxiv column is a column where this site publishes academic and technical content. In the past few years, the AIxiv column of this site has received more than 2,000 reports, covering top laboratories from major universities and companies around the world, effectively promoting academic exchanges and dissemination. If you have excellent work that you want to share, please feel free to contribute or contact us for reporting. Submission email: liyazhou@jiqizhixin.com; zhaoyunfeng@jiqizhixin.com

The authors of this paper are from Huawei’s Montreal Noah’s Ark Laboratory Kang Jikun, Li Xinze, Chen Xi, Amirreza Kazemi, and Chen Boxing.

Artificial intelligence (AI) has made great progress in the past decade, especially in the fields of natural language processing and computer vision. However, how to improve AI’s cognitive capabilities and reasoning capabilities remains a huge challenge.

Recently, a paper titled "MindStar: Enhancing Math Reasoning in Pre-trained LLMs at Inference Time" proposed a tree search-based inference time capability improvement method MindStar [1], which is implemented in the open source model Llama -13-B and Mistral-7B have achieved the reasoning capabilities of approximate closed-source large models GPT-3.5 and Grok-1 on mathematical problems.

Cant wait for OpenAIs Q*, Huawei Noahs secret weapon MindStar to explore LLM reasoning is here first

Paper title: MindStar: Enhancing Math Reasoning in Pre-trained LLMs at Inference Time
Paper address: https://arxiv.org/abs/2405.16265v2

MindStar Application effect on mathematical issues:

Cant wait for OpenAIs Q*, Huawei Noahs secret weapon MindStar to explore LLM reasoning is here first

^{Figure 1: Mathematical accuracy of different large -scale language models. LLaMA-2-13B is similar in mathematical performance to GPT-3.5 (4-shot) but saves approximately 200 times more computational resources. Introduction Impressive results have been demonstrated in areas such as ,and creative writing [5]. However, unlocking the ability of LLMs to solve complex reasoning tasks remains a challenge. Some recent studies [6,7] try to solve the problem through supervised fine-tuning (SFT). By mixing new inference data samples with the original data set, LLMs learn the underlying distribution of these samples and try to imitate the underlying distribution. Learn logic to solve unseen reasoning tasks. Although this approach has performance improvements, it relies heavily on extensive training and additional data preparation [8,9].}

The Llama-3 report [10] highlights an important observation: when faced with a challenging inference problem, models sometimes generate correct inference trajectories. This suggests that the model knows how to produce the correct answer, but is having trouble selecting it. Based on this finding, we asked a simple question: Can we enhance the reasoning capabilities of LLMs by helping them choose the right output? To explore this, we conducted an experiment utilizing different reward models for LLMs output selection. Experimental results show that step-level selection significantly outperforms traditional CoT methods.

2. MindStar Method

^{Figure 2 Algorithm architecture diagram of MindStar}

We introduce a new inference search framework - MindStar (M*), by treating the inference task as a search problem and leveraging the rewards of process supervision Model (Process-supervised Reward Model, PRM), M * effectively navigates in the inference tree space and identifies approximately optimal paths. Combining the ideas of Beam Search (BS) and Levin Tree Search (LevinTS), the search efficiency is further enhanced and the optimal reasoning path is found within limited computational complexity.

2.1 Process Supervised Reward Model

The Process Supervised Reward Model (PRM) is designed to evaluate the intermediate steps of large language model (LLM) generation to help select the correct inference path. This approach builds on the success of PRM in other applications. Specifically, PRM takes the current reasoning path Cant wait for OpenAIs Q*, Huawei Noahs secret weapon MindStar to explore LLM reasoning is here first

and the potential next step Cant wait for OpenAIs Q*, Huawei Noahs secret weapon MindStar to explore LLM reasoning is here first

as input, and returns a reward value Cant wait for OpenAIs Q*, Huawei Noahs secret weapon MindStar to explore LLM reasoning is here first

PRM evaluates new steps by considering the entire current reasoning trajectory, encouraging consistency and fidelity to the overall path. A high reward value indicates that the new step Cant wait for OpenAIs Q*, Huawei Noahs secret weapon MindStar to explore LLM reasoning is here first

) is likely to be correct for a given reasoning path Cant wait for OpenAIs Q*, Huawei Noahs secret weapon MindStar to explore LLM reasoning is here first

, making the expansion path worth further exploration. Conversely, a low reward value indicates that the new step may be incorrect, which means that the solution following this path may also be incorrect. The

M* algorithm consists of two main steps, iterating until the correct solution is found:

1. Inference path expansion: In each iteration, the underlying LLM generates the next step of the current inference path.

2. Evaluation and selection: Use PRM to evaluate the generated steps and select the inference path for the next iteration based on these evaluations.

2.2 Inference path expansion

Cant wait for OpenAIs Q*, Huawei Noahs secret weapon MindStar to explore LLM reasoning is here first

After selecting the inference path to extend Cant wait for OpenAIs Q*, Huawei Noahs secret weapon MindStar to explore LLM reasoning is here first

, we designed a prompt template (Example 3.1) to collect the next steps from the LLM. As the example shows, LLM treats the original question as {question} and the current reasoning path as {answer}. Note that in the first iteration of the algorithm, the selected node is the root node that contains only the question, so {answer} is empty. For an inference path Cant wait for OpenAIs Q*, Huawei Noahs secret weapon MindStar to explore LLM reasoning is here first

, LLM generates N intermediate steps and appends them as children of the current node. In the next step of the algorithm, these newly generated child nodes are evaluated and a new node is selected for further expansion. We also realized that another way to generate steps is to fine-tune the LLM using step markers. However, this may reduce the inference ability of LLM, and more importantly, it goes against the focus of this article - to enhance the inference ability of LLM without modifying the weights.

2.3 Inference path selection

After expanding the inference tree, we use a pre-trained procedural supervised reward model (PRM) to evaluate each newly generated step. As mentioned earlier, PRM takes a path and a step , and returns the corresponding reward value. After evaluation, we need a tree search algorithm to select the next node to expand. Our framework does not rely on a specific search algorithm, and in this work we instantiate two best-first search methods, namely Beam Search and Levin Tree Search.

3. Results and Discussion

Extensive evaluation on GSM8K and MATH datasets shows that M* significantly improves the inference capabilities of open source models (such as LLaMA-2), and its performance is comparable to It is comparable to larger closed-source models (such as GPT-3.5 and Grok-1), while significantly reducing model size and computational cost. These findings highlight the potential of shifting computational resources from fine-tuning to inference-time search, opening new avenues for future research into efficient inference enhancement techniques.

Cant wait for OpenAIs Q*, Huawei Noahs secret weapon MindStar to explore LLM reasoning is here first

Table 1 shows the comparison results of various schemes on the GSM8K and MATH inference benchmarks. The number for each entry indicates the percentage of problem solved. The notation SC@32 represents self-consistency among 32 candidate results, while n-shot represents the results on few-shot examples. CoT-SC@16 refers to self-consistency among 16 Chain of Thought (CoT) candidate results. BS@16 represents the beam search method, which involves 16 candidate results at each step level, while LevinTS@16 details the Levin tree search method using the same number of candidate results. It is worth noting that the latest result for GPT-4 on the MATH dataset is GPT-4-turbo-0409, which we particularly emphasize as it represents the best performance in the GPT-4 family.

Cant wait for OpenAIs Q*, Huawei Noahs secret weapon MindStar to explore LLM reasoning is here first

Figure 3 We study how M* performance changes as the number of step-level candidates changes. We selected Llama-2-13B as the base model and beam search (BS) as the search algorithm, respectively.

Cant wait for OpenAIs Q*, Huawei Noahs secret weapon MindStar to explore LLM reasoning is here first

Figure 4 Scaling laws of the Llama-2 and Llama-3 model families on the MATH data set. All results are derived from their original sources. We use Scipy tools and logarithmic functions to calculate the fitted curves.

Cant wait for OpenAIs Q*, Huawei Noahs secret weapon MindStar to explore LLM reasoning is here first

Table 2 Average number of tokens produced by different methods when answering questions

4. Conclusion

This paper introduces MindStar (M*), a novel search-based reasoning framework for Enhance the inference capabilities of pre-trained large language models. By treating the inference task as a search problem and leveraging a reward model of process supervision, M* efficiently navigates in the inference tree space, identifying near-optimal paths. Combining the ideas of beam search and Levin tree search further enhances the search efficiency and ensures that the best reasoning path can be found within limited computational complexity. Extensive experimental results show that M* significantly improves the inference capabilities of open source models, and its performance is comparable to larger closed source models, while significantly reducing model size and computational costs.

These research results show that shifting computing resources from fine-tuning to inference-time search has great potential, opening up new avenues for future research on efficient inference enhancement technologies.

^References:

^{[1] Nisan Stiennon, Long Ouyang, Jeffrey Wu, Daniel Ziegler, Ryan Lowe, Chelsea Voss, Alec Radford, Dario Amodei, and Paul F Christiano. Learning to summarize with human feedback. Advances in Neural Information Processing Systems, 33:3008–3021, 2020.}

^{[2] Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al. Training language models to follow instructions with human feedback. Advances in neural information processing systems, 35:27730–27744, 2022.}

^{[3] Ziyang Luo, Can Xu, Pu Zhao, Qingfeng Sun, Xiubo Geng, Wenxiang Hu, Chongyang Tao, Jing Ma, Qingwei Lin, and Daxin Jiang. Wizardcoder: Empowering code large language models with evol-instruct. arXiv preprint arXiv:2306.08568, 2023.}

^{[4] Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, et al. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374, 2021.}

^{[5] Carlos Gómez-Rodríguez and Paul Williams. A confederacy of models: A comprehensive evaluation of llms on creative writing. arXiv preprint arXiv:2310.08433, 2023.}

^{[6] Longhui Yu, Weisen Jiang, Han Shi, Jincheng Yu, Zhengying Liu, Yu Zhang, James T Kwok, Zhenguo Li, Adrian Weller, and Weiyang Liu. Metamath: Bootstrap your own mathematical questions for large language models. arXiv preprint arXiv:2309.12284, 2023.}

^{[7] Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Mingchuan Zhang, YK Li, Y Wu, and Daya Guo. Deepseekmath: Pushing the limits of mathematical reasoning in open language models. arXiv preprint arXiv:2402.03300, 2024.}

^{[8] Keiran Paster, Marco Dos Santos, Zhangir Azerbayev, and Jimmy Ba. Openwebmath: An open dataset of high-quality mathematical web text. arXiv preprint arXiv:2310.06786, 2023.}

^{[9] Peiyi Wang, Lei Li, Zhihong Shao, RX Xu, Damai Dai, Yifei Li, Deli Chen, Y Wu, and Zhifang Sui. Math-shepherd : Verify and reinforce llms step-by-step without human annotations. CoRR, abs/2312.08935, 2023.}

^{[10] Meta AI. Introducing meta llama 3: The most capable openly available llm to date, April 2024 . URL https://ai.meta.com/blog/meta-llama-3/. Accessed: 2024-04-30.}

The above is the detailed content of Can't wait for OpenAI's Q*, Huawei Noah's secret weapon MindStar to explore LLM reasoning is here first. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

How to fix KB5055523 fails to install in Windows 11?

4 weeks ago By DDD

How to fix KB5055518 fails to install in Windows 10?

4 weeks ago By DDD

Roblox: Grow A Garden - Complete Mutation Guide

3 weeks ago By DDD

Roblox: Bubble Gum Simulator Infinity - How To Get And Use Royal Keys

3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

How to fix KB5055612 fails to install in Windows 10?

3 weeks ago By DDD

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Java Tutorial

1664

CakePHP Tutorial

1422

Laravel Tutorial

1316

PHP Tutorial

1268

C# Tutorial

1241

Related knowledge

The author of ControlNet has another hit! The whole process of generating a painting from a picture, earning 1.4k stars in two days Jul 17, 2024 am 01:56 AM

It is also a Tusheng video, but PaintsUndo has taken a different route. ControlNet author LvminZhang started to live again! This time I aim at the field of painting. The new project PaintsUndo has received 1.4kstar (still rising crazily) not long after it was launched. Project address: https://github.com/lllyasviel/Paints-UNDO Through this project, the user inputs a static image, and PaintsUndo can automatically help you generate a video of the entire painting process, from line draft to finished product. follow. During the drawing process, the line changes are amazing. The final video result is very similar to the original image: Let’s take a look at a complete drawing.

Topping the list of open source AI software engineers, UIUC's agent-less solution easily solves SWE-bench real programming problems Jul 17, 2024 pm 10:02 PM

The AIxiv column is a column where this site publishes academic and technical content. In the past few years, the AIxiv column of this site has received more than 2,000 reports, covering top laboratories from major universities and companies around the world, effectively promoting academic exchanges and dissemination. If you have excellent work that you want to share, please feel free to contribute or contact us for reporting. Submission email: liyazhou@jiqizhixin.com; zhaoyunfeng@jiqizhixin.com The authors of this paper are all from the team of teacher Zhang Lingming at the University of Illinois at Urbana-Champaign (UIUC), including: Steven Code repair; Deng Yinlin, fourth-year doctoral student, researcher

From RLHF to DPO to TDPO, large model alignment algorithms are already 'token-level' Jun 24, 2024 pm 03:04 PM

The AIxiv column is a column where this site publishes academic and technical content. In the past few years, the AIxiv column of this site has received more than 2,000 reports, covering top laboratories from major universities and companies around the world, effectively promoting academic exchanges and dissemination. If you have excellent work that you want to share, please feel free to contribute or contact us for reporting. Submission email: liyazhou@jiqizhixin.com; zhaoyunfeng@jiqizhixin.com In the development process of artificial intelligence, the control and guidance of large language models (LLM) has always been one of the core challenges, aiming to ensure that these models are both powerful and safe serve human society. Early efforts focused on reinforcement learning methods through human feedback (RL

arXiv papers can be posted as 'barrage', Stanford alphaXiv discussion platform is online, LeCun likes it Aug 01, 2024 pm 05:18 PM

cheers! What is it like when a paper discussion is down to words? Recently, students at Stanford University created alphaXiv, an open discussion forum for arXiv papers that allows questions and comments to be posted directly on any arXiv paper. Website link: https://alphaxiv.org/ In fact, there is no need to visit this website specifically. Just change arXiv in any URL to alphaXiv to directly open the corresponding paper on the alphaXiv forum: you can accurately locate the paragraphs in the paper, Sentence: In the discussion area on the right, users can post questions to ask the author about the ideas and details of the paper. For example, they can also comment on the content of the paper, such as: "Given to

Posthumous work of the OpenAI Super Alignment Team: Two large models play a game, and the output becomes more understandable Jul 19, 2024 am 01:29 AM

If the answer given by the AI model is incomprehensible at all, would you dare to use it? As machine learning systems are used in more important areas, it becomes increasingly important to demonstrate why we can trust their output, and when not to trust them. One possible way to gain trust in the output of a complex system is to require the system to produce an interpretation of its output that is readable to a human or another trusted system, that is, fully understandable to the point that any possible errors can be found. For example, to build trust in the judicial system, we require courts to provide clear and readable written opinions that explain and support their decisions. For large language models, we can also adopt a similar approach. However, when taking this approach, ensure that the language model generates

A significant breakthrough in the Riemann Hypothesis! Tao Zhexuan strongly recommends new papers from MIT and Oxford, and the 37-year-old Fields Medal winner participated Aug 05, 2024 pm 03:32 PM

Recently, the Riemann Hypothesis, known as one of the seven major problems of the millennium, has achieved a new breakthrough. The Riemann Hypothesis is a very important unsolved problem in mathematics, related to the precise properties of the distribution of prime numbers (primes are those numbers that are only divisible by 1 and themselves, and they play a fundamental role in number theory). In today's mathematical literature, there are more than a thousand mathematical propositions based on the establishment of the Riemann Hypothesis (or its generalized form). In other words, once the Riemann Hypothesis and its generalized form are proven, these more than a thousand propositions will be established as theorems, which will have a profound impact on the field of mathematics; and if the Riemann Hypothesis is proven wrong, then among these propositions part of it will also lose its effectiveness. New breakthrough comes from MIT mathematics professor Larry Guth and Oxford University

The first Mamba-based MLLM is here! Model weights, training code, etc. have all been open source Jul 17, 2024 am 02:46 AM

The AIxiv column is a column where this site publishes academic and technical content. In the past few years, the AIxiv column of this site has received more than 2,000 reports, covering top laboratories from major universities and companies around the world, effectively promoting academic exchanges and dissemination. If you have excellent work that you want to share, please feel free to contribute or contact us for reporting. Submission email: liyazhou@jiqizhixin.com; zhaoyunfeng@jiqizhixin.com. Introduction In recent years, the application of multimodal large language models (MLLM) in various fields has achieved remarkable success. However, as the basic model for many downstream tasks, current MLLM consists of the well-known Transformer network, which

LLM is really not good for time series prediction. It doesn't even use its reasoning ability. Jul 15, 2024 pm 03:59 PM

Can language models really be used for time series prediction? According to Betteridge's Law of Headlines (any news headline ending with a question mark can be answered with "no"), the answer should be no. The fact seems to be true: such a powerful LLM cannot handle time series data well. Time series, that is, time series, as the name suggests, refers to a set of data point sequences arranged in the order of time. Time series analysis is critical in many areas, including disease spread prediction, retail analytics, healthcare, and finance. In the field of time series analysis, many researchers have recently been studying how to use large language models (LLM) to classify, predict, and detect anomalies in time series. These papers assume that language models that are good at handling sequential dependencies in text can also generalize to time series.

See all articles