Mathematics, as the cornerstone of science, has always been a key area of research and innovation.
Recently, seven institutions including Princeton University jointly released a large language model LLEMMA specifically for mathematics, with performance comparable to Google Minerva 62B, and made its model, data set and code public , bringing unprecedented opportunities and resources to mathematical research.
Paper address: https://arxiv.org/abs/2310.10631
Dataset The link address is: https://huggingface.co/datasets/EleutherAI/proof-pile-2
Project address: https://github.com/EleutherAI/math-lm What needs to be rewritten is:
LLEMMA inherits the foundation of Code Llama and is pre-trained on Proof-Pile-2.
Proof-Pile-2, a huge mixed data set, contains information on 55 billion tokens, including scientific papers, web data rich in mathematical content, and mathematical codes.
Part of this data set, the Algebraic Stack, brings together 11B data sets from 17 languages, covering numerical, symbolic and mathematical proofs.
With 700 million and 3.4 billion parameters, it performs extremely well on the MATH benchmark, surpassing all known Open source base model.
Compared with the closed model developed by Google Research specifically for mathematics, the number of parameters is only half of Minerva 62B Conditions, Llemma 34B achieved almost the same performance.
Llemma surpasses Minerva's performance in solving problems on a parametric basis. It uses computational tools and formal theorem proofs to provide unlimited possibilities for solving mathematical problems
It can conveniently use the Python interpreter and formal prover, further demonstrating its ability to solve mathematical problems
Due to special emphasis on formal proof data, Algebraic Stack has become the first open basic model to demonstrate the ability to prove few-sample theorem
Figure
The researchers also openly shared all the training data and code of LLEMMA. Different from previous mathematical models, LLEMMA is an open source, open and shared model, opening the door to the entire scientific research community.
The researchers tried to quantify the model memory effect, and surprisingly, they found that Llemma did not become more accurate for problems that appeared in the training set. Because the code and data are publicly available, the researchers encourage others to replicate and extend their analysis
LLEMMA is a large language model dedicated to mathematics, which continues on the basis of Code Llama on Proof-Pile-2 Do pre-training. Proof-Pile-2 is a mixed dataset containing scientific papers, web data with mathematical content, and mathematical code. It contains 55 billion tags
The code part of AlgebraicStack contains 11B A dataset that includes source code in 17 languages, covering numerical, symbolic and formal mathematics, and has been publicly released for every model of
All are initialized by Code Llama. The Code Llama model is a decoder-only language model that is initialized from Llama 2
The author further trained the Code Llama model on Proof-Pile-2 , using standard autoregressive language modeling objectives. For the 7B model, the author performed training with 200B markers, while for the 34B model, the author performed training with 50B markers
The author uses Proof-Pile-2 to continue pre-training Code Llama, and conducts a few-shot evaluation of LLEMMA on multiple mathematical problem solving tasks such as MATH and GSM8k.
The researchers found that LLEMMA significantly improved on these tasks and was able to adapt to different problem types and difficulties.
LLEMMA 34B demonstrates more powerful mathematical capabilities than other open-ended basic models in extremely difficult mathematical problems
On math benchmarks, LLEMMA’s continuous pre-training on Proof-Pile-2 improves few-shot performance on five math benchmarks.
The improvement of LLEMMA 34B is 20 percentage points higher than Code Llama on GSM8k and 13 percentage points higher on MATH. Moreover, LLEMMA 7B also outperforms the proprietary Minerva model of similar size, proving that pre-training on Proof-Pile-2 can effectively improve the mathematical problem-solving capabilities of large models
When solving mathematical problems, using computing tools such as Python, LLEMMA is better than Code Llama on both MATH Python and GSM8k Python tasks
When using MATH and GSM8k datasets, LLEMMA performs better than without the tool
In mathematical proof tasks, LLEMMA performs well
The goal of the informal-to-formal proof task is to generate a formal proof, given a formal statement, an informal LATEX statement, and an informal LATEX proof, Then verify it through the proof assistant.
Formal to formal proof is to prove a formal statement by generating a series of proof steps (strategies). The results show that continuous pre-training of LLEMMA on Proof-Pile-2 improves the few-shot performance of these two formal theorem proving tasks.
LLEMMA not only has impressive performance, but also opens up revolutionary data sets and demonstrates amazing problem-solving capabilities.
The spirit of open source sharing marks the mathematical world entering a new era. The future of mathematics is here, and every one of us mathematics enthusiasts, researchers, and educators will benefit from it.
The emergence of LLEMMA provides us with unprecedented tools to make solving mathematical problems more efficient and innovative.
In addition, the concept of open sharing will also promote deeper cooperation among the global scientific research community and jointly promote scientific progress.
The above is the detailed content of Princeton open source 34B mathematical model: parameters are halved, performance is comparable to Google Minerva, and 55 billion Tokens are used for professional data training. For more information, please follow other related articles on the PHP Chinese website!