Home > Technology peripherals > AI > Complex Reasoning in LLMs: Why do Smaller Models Struggle?

Complex Reasoning in LLMs: Why do Smaller Models Struggle?

尊渡假赌尊渡假赌尊渡假赌
Release: 2025-03-20 10:51:12
Original
1001 people have browsed it

This research paper, "Not All LLM Reasoners Are Created Equal," explores the limitations of large language models (LLMs) in complex reasoning tasks, particularly those requiring multi-step problem-solving. While LLMs excel at challenging mathematical problems, their performance significantly degrades when faced with interconnected questions where the solution to one problem informs the next – a concept termed "compositional reasoning."

The study, conducted by researchers from Mila, Google DeepMind, and Microsoft Research, reveals a surprising weakness in smaller, more cost-efficient LLMs. These models, while proficient at simpler tasks, struggle with the "second-hop reasoning" needed to solve chained problems. This isn't due to issues like data leakage; rather, it stems from an inability to maintain context and logically connect problem parts. Instruction tuning, a common performance-enhancing technique, provides inconsistent benefits for smaller models, sometimes leading to overfitting.

Complex Reasoning in LLMs: Why do Smaller Models Struggle?

Key Findings:

  • Smaller LLMs exhibit a significant "reasoning gap" when tackling compositional problems.
  • Performance drops dramatically when solving interconnected questions.
  • Instruction tuning yields inconsistent improvements in smaller models.
  • This reasoning limitation restricts the reliability of smaller LLMs in real-world applications.
  • Even specialized math models struggle with compositional reasoning.
  • More effective training methods are needed to enhance multi-step reasoning capabilities.

The paper uses a compositional Grade-School Math (GSM) test to illustrate this gap. The test involves two linked questions, where the answer to the first (Q1) becomes a variable (X) in the second (Q2). The results show that most models perform far worse on the compositional task than predicted by their performance on individual questions. Larger, more powerful models like GPT-4o demonstrate superior reasoning abilities, while smaller, cost-effective models, even those specialized in math, show a substantial performance decline.

Complex Reasoning in LLMs: Why do Smaller Models Struggle?

A graph comparing open-source and closed-source LLMs highlights this reasoning gap. Smaller, cost-effective models consistently exhibit larger negative reasoning gaps, indicating poorer performance on compositional tasks compared to larger models. GPT-4o, for example, shows minimal gap, while others like Phi 3-mini-4k-IT demonstrate significant shortcomings.

Complex Reasoning in LLMs: Why do Smaller Models Struggle?

Further analysis reveals that the reasoning gap is not solely due to benchmark leakage. The issues stem from overfitting to benchmarks, distraction by irrelevant context, and a failure to transfer information effectively between subtasks.

Complex Reasoning in LLMs: Why do Smaller Models Struggle?

Complex Reasoning in LLMs: Why do Smaller Models Struggle?

Complex Reasoning in LLMs: Why do Smaller Models Struggle?

The study concludes that improving compositional reasoning requires innovative training approaches. While techniques like instruction tuning and math specialization offer some benefits, they are insufficient to bridge the reasoning gap. Exploring alternative methods, such as code-based reasoning, may be necessary to enhance the ability of LLMs to handle complex, multi-step reasoning tasks. The research emphasizes the need for improved training techniques to enable smaller, more cost-effective LLMs to reliably perform complex reasoning tasks.

The above is the detailed content of Complex Reasoning in LLMs: Why do Smaller Models Struggle?. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template