Produced by 51CTO Technology Stack (WeChat ID: blog51cto)
Late at night, OpenAI’s strongest rival Anthropic released three new SOTA products in one go, including: Claude 3 Haiku, Claude 3 Sonnet and Claude 3 Opus. The industry exclaimed: Claude 3 brings a series of new industry benchmarks.
Among them, Opus means an epic movement, Sonnet means a sonnet, and Haiku means a haiku, a three-line short poem.
It is reported (Figure 9) that Haiku is the fastest and most cost-effective model in the AI market. It can read informative and data-intensive research papers (~10k tokens) on arXiv containing charts and graphs in less than three seconds.
Figure 8, for the vast majority of workloads, Sonnet is 2 times faster than Claude 2 and Claude 2.1, and has a higher level of intelligence. It excels at tasks that require fast responses, such as knowledge retrieval or sales automation. Opus is similar in speed to Claude 2 and 2.1, but has a higher level of intelligence.
Opus is Anthropic’s smartest model currently, outperforming its peers on most common evaluation benchmarks for artificial intelligence systems, including Undergraduate Level Expert Knowledge (MMLU), Graduate Level Expert Reasoning (GPQA), and Basic Mathematics (GSM8K) etc. It demonstrates near-human-level understanding and fluency on complex tasks, leading the frontier of general intelligence.
The above three Claude 3 models all show greater capabilities in analysis and prediction, detailed content creation, code generation, and conversation in non-English languages such as Spanish, Japanese, and French.
In addition, the Claude 3 series has near-perfect memory capabilities and ultra-long text windows, which will provide 200K context windows. And it can accept the input of more than 1 million tokens, and will be gradually released according to customer needs.
Overall, Claude3 has three shocking features:
1. The benchmark of domain experts. Three expert fields of finance/medicine/philosophy were selected as test benchmarks. NVIDIA Research Manager Jim Fan said that "it is recommended that all LLMs follow this so that different downstream applications know what will happen."
2. Rejection rate analysis. LLM has become an epidemic with overly cautious answers to many "innocent questions". Claude has been committed to safe AI research and has made efforts in this area.
3. Complex visual features comparable to other leading models. Works with a variety of visual formats including photos, diagrams, graphics and technical diagrams, PDFs, flowcharts or presentation slides.
How is the actual evaluation?
A netizen asked GPT4 and Claude3 to write a login interface code respectively. The actual running test results (Figure 13) show that the latter is better in App interface design.
##
The above is the detailed content of Claude3 taught GPT4 a lesson! Open AI's strongest opponent is a late-night bomb, with full picture analysis!. For more information, please follow other related articles on the PHP Chinese website!