Home > Technology peripherals > AI > Tied for first place with GPT-4, the LMSYS benchmark shows that the Claude-3 model performs well

Tied for first place with GPT-4, the LMSYS benchmark shows that the Claude-3 model performs well

WBOY
Release: 2024-03-28 17:26:43
forward
501 people have browsed it

和 GPT-4 并列第一,LMSYS 基准测试显示 Claude-3 模型表现优异

News on March 28, according to the latest benchmark report released by LMSYS Org, Claude-3’s score surpassed GPT-4 by a narrow margin, becoming the platform’s " "Best" large language model.

This website first introduces the LMSYS Org, which is a research organization jointly created by the University of California, Berkeley, the University of California, San Diego, and Carnegie Mellon University.

The system launches Chatbot Arena, a benchmark platform for large language models (LLM), which uses crowdsourcing to anonymously and randomly test large model products. Its ratings are based on widespread use in competitive games such as chess. Elo scoring system.

Through the rating results generated by user voting, the system will randomly select two different large model robots to chat with users each time, and allow users to anonymously choose which large model product performs better. , overall relatively fair.

Chatbot Arena Since its launch last year, GPT-4 has been firmly in the top spot and has even become the gold standard for evaluating large models.

和 GPT-4 并列第一,LMSYS 基准测试显示 Claude-3 模型表现优异

But yesterday, Anthropic’s Claude 3 Opus defeated GPT-4 by a narrow margin of 1253 to 1251, and OpenAI’s LLM was pushed out of the top spot. Because the score was too close, the agency ranked Claude 3 and GPT-4 tied for first place due to error rate considerations, and another preview version of GPT-4 was also tied for first place.

和 GPT-4 并列第一,LMSYS 基准测试显示 Claude-3 模型表现优异

和 GPT-4 并列第一,LMSYS 基准测试显示 Claude-3 模型表现优异

Even more impressive is Claude 3 Haiku making it into the top ten. Haiku is Anthropic’s local size model, equivalent to Google’s Gemini Nano.

It is much smaller than Opus which has trillions of parameters, so it is much faster in comparison. According to LMSYS data, Haiku ranks seventh on the list, with performance comparable to GPT-4.

The above is the detailed content of Tied for first place with GPT-4, the LMSYS benchmark shows that the Claude-3 model performs well. For more information, please follow other related articles on the PHP Chinese website!

Related labels:
AI
source:51cto.com
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template