At the 2023 Hangzhou Computing Conference, Alibaba Cloud Chief Technology Officer Zhou Jingren released Tongyi Qianwen 2.0, a large model with hundreds of billions of parameters. In 10 authoritative evaluations, the comprehensive performance of Tongyi Qianwen 2.0 exceeded GPT-3.5 and is accelerating to catch up with GPT-4.
According to reports, Tongyi Qianwen 2.0 has made tremendous progress in the past six months. Compared with version 1.0 released in April, Tongyi Qianwen 2.0 has significantly improved its capabilities in complex command understanding, literary creation, general mathematics, knowledge memory and hallucination resistance. At present, the comprehensive performance of Tongyi Qianwen has exceeded GPT-3.5, and is accelerating the pace of catching up with GPT-4
Tongyi Qianwen 2.0 has 10 major advantages in MMLU, C-Eval, GSM8K, HumanEval, MATH, etc. The score on the mainstream Benchmark evaluation set overall exceeds Meta's Llama-2-70B, the winning rate is 91% higher than OpenAI's Chat-3.5, 46% higher than GPT-4, and the gap with GPT-4 is further narrowed.
The ability to understand Chinese and English is the basic skill of large language models. In terms of English tasks, Tongyi Qianwen 2.0 scored 82.5 on the MMLU benchmark, second only to GPT-4. By significantly increasing the number of parameters, Tongyi Qianwen 2.0 can better understand and process complex language structures and concepts; Chinese In terms of tasks, Tongyi Qianwen 2.0 achieved the highest score on the C-Eval benchmark with a clear advantage. This is because the model learned more Chinese corpus during training, further strengthening its Chinese understanding and expression capabilities.
In areas such as mathematical reasoning and code understanding, Tongyi Qianwen 2.0 has made significant progress. In the reasoning benchmark test GSM8K, Tongyi Qianwen ranked second, demonstrating strong computing and logical reasoning capabilities; in the HumanEval test, Tongyi Qianwen's score closely followed GPT-4 and GPT-3.5, which mainly measures large-scale The ability of the model to understand and execute code fragments is the basis for large models to be used in scenarios such as programming assistance and automatic code repair.
The above is the detailed content of Alibaba Cloud releases Tongyi Qianwen 2.0, with performance accelerated to catch up with GPT-4. For more information, please follow other related articles on the PHP Chinese website!