


The just exposed Claude3 directly attacks the biggest weakness of OpenAI
Enterprise-level SOTA large model, what signals does Anthropic's Claude3 release?
Author | Wan Chen
Editor | Jingyu
is an entrepreneurial project as the head of OpenAI GPT3 R&D, Anthropic is seen as the startup that can best compete with OpenAI.
Anthropic released a set of large Claude 3 series models on Monday local time, claiming that its most powerful model outperformed OpenAI’s GPT-4 and Google’s Gemini 1.0 Ultra in various benchmark tests. .
However, the ability to handle more complex reasoning tasks, be more intelligent, and respond faster, these comprehensive capabilities that rank among the top three large models are only the basic skills of Claude3.
Anthropic is committed to becoming the best partner for corporate customers.
This is first reflected in Claude3, which is a set of models: Haiku, Sonnet and Opus, allowing enterprise customers to choose versions with different performance and different costs according to their own scenarios.
Secondly, Anthropic emphasizes that its own model is the safest. Anthropic President Daniela Amodei introduced that a technology called "Constitutional Artificial Intelligence" was introduced in Claude3's training to enhance its safety, trustworthiness, and reliability. Fu Yao, a doctoral student in large models and reasoning at the University of Edinburgh, pointed out after reading Claude3’s technical report that Claude3 performed well in benchmark tests of complex reasoning, especially in the financial and medical fields. As a ToB company, Anthropic chooses to focus on optimizing the areas with the most profit potential.
Now, Anthropic is open to use two models of the Claude3 series (Haiku and Sonnet) in 159 countries, and the most powerful version, Opus, is also about to be launched. At the same time, Anthropic also provides services through the cloud platforms of Amazon and Google, the latter of which invested US$4 billion and US$2 billion respectively in Anthropic.

Claude3 Family: Opus , Sonnet and HaikuAccording to Anthropic’s official website, Claude3 is a series of models that includes three state-of-the-art models: Claude 3 Haiku, Claude 3 Sonnet and Claude 3 Opus, allowing users to choose for their specific applications The best balance of intelligence, speed and cost. In terms of the general capabilities of the model, Anthropic said that the Claude 3 series "sets a new industry benchmark for a wide range of cognitive tasks" in analysis and prediction, detailed content generation, code generation, and Spanish, Japanese In terms of conversations with non-English languages such as French, it has a stronger ability and is more timely in task response.
Among them, Claude 3 Opus is the most intelligent model in this group of models, especially in processing highly complex tasks. Opus outperforms its peers in most common benchmarks, including Undergraduate Level Expert Knowledge (MMLU), Graduate Level Expert Reasoning (GPQA), Basic Mathematics (GSM8K), and more. It shows near-human-level understanding and fluency on complex tasks. It is currently Anthropic's most cutting-edge exploration of general intelligence, "demonstrating the outer limits of generative artificial intelligence."

02,
Iteration targeting enterprise customersCo-founder Daniela Amodei introduced that in addition to the advancement of general intelligence, Anthropic pays special attention to enterprise customers. There are many challenges faced when integrating generative AI into their business. Aimed at enterprise customers, the Claude3 family offers improvements in visual capabilities, accuracy, long text input, and security. Many corporate customers have knowledge bases in multiple formats, including PDF, flowcharts or presentation slides. Claude 3 Series models can now handle content in a variety of visual formats, including photos, charts, graphs and technical diagrams. Claude3 has also been optimized for accuracy and capabilities with long text windows.
In terms of accuracy, Anthropic uses a large number of complex factual questions to target known weaknesses in current models, classifying answers into correct answers, incorrect answers (or hallucinations) and acknowledging uncertainty. Accordingly, the Claude3 model indicates that it does not know the answer, rather than providing incorrect information. The strongest version of them all, Claude 3 Opus, doubled the accuracy (or correct answers) on challenging open-ended questions than Claude 2.1, while also reducing the level of incorrect answers.

At the same time, due to the improvement in context understanding capabilities, the Claude3 family will make fewer rejections in response to user tasks compared to previous versions.
In addition to a more accurate reply, Anthropic said it will bring to Claude 3 "citation" feature that can point to precise sentences in reference materials to verify their answers.
Currently, Claude 3 series models will provide a context window of 200K tokens. Subsequently, all three models will be able to accept inputs of more than 1 million tokens, and this capability will be provided to select customers who require enhanced processing capabilities. Anthropic briefly elaborated on Claude3’s upper text window capabilities in its technical report, including its ability to effectively handle longer contextual cue words and its recall capabilities.
03, "Constitutional Artificial Intelligence", Dealing with "Inexact Science"
It is worth noting that, As a multi-modal model, Claude3 can input images but cannot output image content. Co-founder Daniela Amodei said this is because "we found that businesses have much less need for images."
The release of Claude3 was released after the controversy caused by the images generated by Google Gemini. Claude, which is aimed at enterprise customers, is also bound to control and balance issues such as value bias caused by AI.
In this regard, Dario Amodei emphasized the difficulty of controlling artificial intelligence models, calling it "inexact science." He said the company has a dedicated team dedicated to assessing and mitigating the various risks posed by the model.
Another co-founder, Daniela Amodei, also admitted that completely unbiased AI may not be possible with current methods. "Creating a completely neutral generative AI tool is nearly impossible, not only technically, but also because not everyone agrees on what neutrality is," she said. .

This article comes from the WeChat public account: Geek Park (ID: geekpark), author: Wan Chen
The above is the detailed content of The just exposed Claude3 directly attacks the biggest weakness of OpenAI. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics



This site reported on June 27 that Jianying is a video editing software developed by FaceMeng Technology, a subsidiary of ByteDance. It relies on the Douyin platform and basically produces short video content for users of the platform. It is compatible with iOS, Android, and Windows. , MacOS and other operating systems. Jianying officially announced the upgrade of its membership system and launched a new SVIP, which includes a variety of AI black technologies, such as intelligent translation, intelligent highlighting, intelligent packaging, digital human synthesis, etc. In terms of price, the monthly fee for clipping SVIP is 79 yuan, the annual fee is 599 yuan (note on this site: equivalent to 49.9 yuan per month), the continuous monthly subscription is 59 yuan per month, and the continuous annual subscription is 499 yuan per year (equivalent to 41.6 yuan per month) . In addition, the cut official also stated that in order to improve the user experience, those who have subscribed to the original VIP

Improve developer productivity, efficiency, and accuracy by incorporating retrieval-enhanced generation and semantic memory into AI coding assistants. Translated from EnhancingAICodingAssistantswithContextUsingRAGandSEM-RAG, author JanakiramMSV. While basic AI programming assistants are naturally helpful, they often fail to provide the most relevant and correct code suggestions because they rely on a general understanding of the software language and the most common patterns of writing software. The code generated by these coding assistants is suitable for solving the problems they are responsible for solving, but often does not conform to the coding standards, conventions and styles of the individual teams. This often results in suggestions that need to be modified or refined in order for the code to be accepted into the application

Large Language Models (LLMs) are trained on huge text databases, where they acquire large amounts of real-world knowledge. This knowledge is embedded into their parameters and can then be used when needed. The knowledge of these models is "reified" at the end of training. At the end of pre-training, the model actually stops learning. Align or fine-tune the model to learn how to leverage this knowledge and respond more naturally to user questions. But sometimes model knowledge is not enough, and although the model can access external content through RAG, it is considered beneficial to adapt the model to new domains through fine-tuning. This fine-tuning is performed using input from human annotators or other LLM creations, where the model encounters additional real-world knowledge and integrates it

If the answer given by the AI model is incomprehensible at all, would you dare to use it? As machine learning systems are used in more important areas, it becomes increasingly important to demonstrate why we can trust their output, and when not to trust them. One possible way to gain trust in the output of a complex system is to require the system to produce an interpretation of its output that is readable to a human or another trusted system, that is, fully understandable to the point that any possible errors can be found. For example, to build trust in the judicial system, we require courts to provide clear and readable written opinions that explain and support their decisions. For large language models, we can also adopt a similar approach. However, when taking this approach, ensure that the language model generates

Editor |ScienceAI Question Answering (QA) data set plays a vital role in promoting natural language processing (NLP) research. High-quality QA data sets can not only be used to fine-tune models, but also effectively evaluate the capabilities of large language models (LLM), especially the ability to understand and reason about scientific knowledge. Although there are currently many scientific QA data sets covering medicine, chemistry, biology and other fields, these data sets still have some shortcomings. First, the data form is relatively simple, most of which are multiple-choice questions. They are easy to evaluate, but limit the model's answer selection range and cannot fully test the model's ability to answer scientific questions. In contrast, open-ended Q&A

Editor | KX In the field of drug research and development, accurately and effectively predicting the binding affinity of proteins and ligands is crucial for drug screening and optimization. However, current studies do not take into account the important role of molecular surface information in protein-ligand interactions. Based on this, researchers from Xiamen University proposed a novel multi-modal feature extraction (MFE) framework, which for the first time combines information on protein surface, 3D structure and sequence, and uses a cross-attention mechanism to compare different modalities. feature alignment. Experimental results demonstrate that this method achieves state-of-the-art performance in predicting protein-ligand binding affinities. Furthermore, ablation studies demonstrate the effectiveness and necessity of protein surface information and multimodal feature alignment within this framework. Related research begins with "S

According to news from this website on July 5, GlobalFoundries issued a press release on July 1 this year, announcing the acquisition of Tagore Technology’s power gallium nitride (GaN) technology and intellectual property portfolio, hoping to expand its market share in automobiles and the Internet of Things. and artificial intelligence data center application areas to explore higher efficiency and better performance. As technologies such as generative AI continue to develop in the digital world, gallium nitride (GaN) has become a key solution for sustainable and efficient power management, especially in data centers. This website quoted the official announcement that during this acquisition, Tagore Technology’s engineering team will join GLOBALFOUNDRIES to further develop gallium nitride technology. G

According to news from this site on August 1, SK Hynix released a blog post today (August 1), announcing that it will attend the Global Semiconductor Memory Summit FMS2024 to be held in Santa Clara, California, USA from August 6 to 8, showcasing many new technologies. generation product. Introduction to the Future Memory and Storage Summit (FutureMemoryandStorage), formerly the Flash Memory Summit (FlashMemorySummit) mainly for NAND suppliers, in the context of increasing attention to artificial intelligence technology, this year was renamed the Future Memory and Storage Summit (FutureMemoryandStorage) to invite DRAM and storage vendors and many more players. New product SK hynix launched last year
