


Wenxin 4.0 performed well in the SuperBench evaluation, leading in many indicators
In March 2024, in the "SuperBench Large Model Comprehensive Capability Evaluation Report" recently released by the Basic Model Research Center of Tsinghua University, the report comprehensively evaluated 14 influential models at home and abroad.
In this report, the outstanding performance of Wenian 4.0 has attracted widespread attention. Its overall performance is close to the top international models, and it is gradually narrowing the gap with the world's leading models, showing that it has become the leading domestic model.
In the evaluation of human alignment ability, Text 4.0 showed outstanding strength and ranked first in the country without any doubt. At the same time, in the evaluation of Chinese reasoning and Chinese language ability, Text 4.0 is also the best. Compared with other models, its advantages are very obvious. Especially in the evaluation of Chinese understanding, the score of Text 4.0 is 0.41 points higher than the second-placed GLM-4, showing its profound skills in Chinese processing.
In the evaluation of mathematical capabilities for semantic understanding, Text 4.0 and Claude-3 models tied for first place in the world, while the well-known GPT-4 series models followed closely behind, ranking fourth and fifth. The scores of other models are mostly concentrated around 55 points, and there is a significant gap between the leading groups.
#In the evaluation of reading comprehension ability, Wenxin 4.0 also shines. It not only surpassed GPT-4 Turbo and Claude-3, but also surpassed GLM-4 and achieved the highest score.
In the security evaluation that enterprises are most concerned about, Text GPT 4.0 also showed excellent performance. It reached a high score of 89.1 points, surpassing the world-class GPT-4 series models and Claude-3. ranked first, while Claude-3 only ranked fourth in this review.
The report also mentioned that since Wenxinyiyan made its public debut on March 16 last year, it has achieved a breakthrough in the number of users in a short period of time, and currently has more than 200 million users. At the same time, the number of daily API calls is also extremely active, exceeding 200 million times.
The above is the detailed content of Wenxin 4.0 performed well in the SuperBench evaluation, leading in many indicators. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics



Based on the continuous optimization of large models, LLM agents - these powerful algorithmic entities have shown the potential to solve complex multi-step reasoning tasks. From natural language processing to deep learning, LLM agents are gradually becoming the focus of research and industry. They can not only understand and generate human language, but also formulate strategies, perform tasks in diverse environments, and even use API calls and coding to Build solutions. In this context, the introduction of the AgentQuest framework is a milestone. It not only provides a modular benchmarking platform for the evaluation and advancement of LLM agents, but also provides researchers with a Powerful tools to track and improve the performance of these agents at a more granular level

Can software compiled by Mingw be used in a Linux environment? Mingw is a tool chain used on the Windows platform to compile and generate programs that can run on Windows. So, can the software compiled by Mingw be used in the Linux environment? The answer is yes, but it requires some extra work and steps. The most common way to run programs compiled on Windows on Linux is to use Wine. Wine is a tool used in Linux and other similar Un

How to use PHP's Web services and API calls With the continuous development of Internet technology, Web services and API calls have become an indispensable part of developers. By using web services and API calls, we can easily interact with other applications to obtain data or implement specific functions. As a popular server-side scripting language, PHP also provides a wealth of functions and tools to support the development of Web services and API calls. In this article, I will briefly introduce how to use PHP to

To view the Litecoin wallet address, visit the Litecoin wallet and look for the address in the "Receive" tab; you can also use a blockchain browser or API call.

Written by Noah | 51CTO Technology Stack (WeChat ID: blog51cto) Siri, who is always criticized by users as "a bit mentally retarded", can be saved! Siri has been one of the representatives in the field of intelligent voice assistants since its birth, but its performance has been unsatisfactory for a long time. However, the latest research results released by Apple's artificial intelligence team are expected to significantly change the status quo. These results are exciting and raise great expectations for the future of this field. In related research papers, Apple's AI experts describe a system in which Siri can do more than just identify content in images, becoming smarter and more useful. This functional model is called ReALM, which is based on the GPT4.0 standard and has a

DeepSeekAI Tool User Guide and FAQ DeepSeek is a powerful AI intelligent tool. This article will answer some common usage questions to help you get started quickly. FAQ: The difference between different access methods: There is no difference in function between web version, App version and API calls, and App is just a wrapper for web version. The local deployment uses a distillation model, which is slightly inferior to the full version of DeepSeek-R1, but the 32-bit model theoretically has 90% full version capability. What is a tavern? SillyTavern is a front-end interface that requires calling the AI model through API or Ollama. What is breaking limit

According to news on May 9, according to contributions from IT House netizens, Baidu Search has recently begun a small-scale public test of the generative AI "conversation" function, which is based on Baidu's Wenxin Yiyan Big Language Model. This product is built based on Baidu's knowledge-enhanced large language model Wen Xinyiyan, and benchmarks Microsoft's search engine Bing's NewBing after integrating OpenAI's ChatGPT service. According to the brand public relations laboratory, the current testing channels for Baidu AI dialogue are Baidu main website and Baidu App, and the independent website is Chat.Baidu.com. Users who use this service need to have and log in to Baidu account. Currently, users who are not included in the test scope cannot access the URL normally. After entering the page, "404NotFound" will be displayed, and when accessing the page, "404NotFound" will be displayed.

PHP connects Baidu Wenxin Yiyan API to obtain specific types of sentences and conducts sentiment analysis. Introduction Baidu Wenxin Yiyan is an API interface that provides Chinese sentences. It can obtain sentences according to specific types, such as inspirational, love, friendship, etc. corresponding sentences. This article will introduce how to use PHP to connect to Baidu Wenxin Yiyan API and perform sentiment analysis on sentences by calling Baidu Sentiment Analysis API. Preparation Before starting, we need to make some preparations: register a Baidu developer account, create an application, and obtain
