Can the model be directly connected to AGI as long as it 'gets bigger'? Marcus bombarded again: Three crises have emerged!-AI-php.cn

Table of Contents

What should I do if I can’t play with the large model?

Home

Can the model be directly connected to AGI as long as it 'gets bigger'? Marcus bombarded again: Three crises have emerged!

王林

Apr 13, 2023 pm 02:58 PM

AI Model

In May of this year, DeepMind released Gato, a multi-modal artificial intelligence system that can perform more than 600 different tasks simultaneously with only one set of model parameters, which temporarily aroused heated discussions about general artificial intelligence (AGI) in the industry. .

Can the model be directly connected to AGI as long as it gets bigger? Marcus bombarded again: Three crises have emerged!

Nando de Freitas, director of the research department of DeepMind, also tweeted at the time that AI can be passed as long as the scale continues to increase!

Can the model be directly connected to AGI as long as it gets bigger? Marcus bombarded again: Three crises have emerged!

What we have to do is to make the model bigger, safer, more computationally efficient, faster sampling, smarter storage, more modalities, on the data Innovation, online/offline, etc.

AGI can be achieved by solving scale problems. The industry needs to pay more attention to these issues!

Recently, Gary Marcus, a well-known AI scholar, founder and CEO of Robust.AI, and emeritus professor at New York University, published another blog, believing that this statement is "too early" and has already begun There is a crisis!

Marcus continues to pay attention to the development of the AI industry, but is critical of the hype of AI. He has expressed objections such as "deep learning hits a wall" and "GPT-3 is completely meaningless".

What should I do if I can’t play with the large model?

Nando believes that artificial intelligence does not require a paradigm shift, it only requires more data, higher efficiency and larger servers.

Marcus paraphrased this hypothesis as: Without fundamental new innovation, AGI may emerge from larger-scale models. This assumption can also be called scaling-über-alles.

His hypothesis, now often referred to as scaling maximalism, remains very popular, largely because larger and larger models are indeed very powerful for tasks such as image generation. Large models are required.

But that’s only so far.

Can the model be directly connected to AGI as long as it gets bigger? Marcus bombarded again: Three crises have emerged!

The problem is that some of the technologies that have been improved over months and years are actually nowhere near the scale we need.

There are more and more Ponzi schemes. The performance advantage brought by scale is only the result of empirical observation and cannot be guaranteed to be correct.

Marcus shares three recent signs that may signal the end of the scale maximization hypothesis.

1. There may not be enough data in the world to support maximum scale.

Many people have begun to worry about this.

Researchers William Merrill, Alex Warstadt, and Tal Linzen from New York University and ETH Zurich recently presented a demonstration that "current neural language models are not well suited to extracting natural language without large amounts of data." The semantics of language".

Can the model be directly connected to AGI as long as it gets bigger? Marcus bombarded again: Three crises have emerged!

Paper link: https://arxiv.org/pdf/2209.12407.pdf

Although this proof contains too many presumptions, So much so that it cannot be taken as a rebuttal, but if this hypothesis is even close to being correct, there could be real trouble on the scale very soon.

2. There may not be enough available computing resources in the world to support maximum scale.

Miguel Solano recently sent Marcus a co-authored manuscript in which the author believes that reaching current super benchmarks such as BIG-bench would require 2022 U.S. electricity consumption More than a quarter of that.

Can the model be directly connected to AGI as long as it gets bigger? Marcus bombarded again: Three crises have emerged!

##Warehouse link:

https://www.php.cn/link/e21bd8ab999859f3642d2227e682e66f

BIG-bench is a crowdsourced benchmark dataset designed to explore large language models and infer their future capabilities, containing over 200 tasks.

3. Some important tasks may simply not scale at scale.

The most obvious example is a recent linguistics assignment by Ruis, Khan, Biderman, Hooker, Rocktäschl and Grefenstette, who studied the pragmatic meaning of language.

For example, for the question "Did you leave fingerprints?", the answer received may be "I wore gloves", whose semantics is "no".

As Marcus has long argued, making a model aware of this without cognitive models and common sense is really difficult.

Scale plays little role in this type of task. Even the best model only has an accuracy of 80.6%. For most models, the effect of scale is negligible at best.

And, you can easily imagine a more complex version of this task, where the performance of the model will be further reduced.

What hit Marcus even more was that even for a single important task like this, about 80% performance may mean that a large-scale game cannot continue to be played.

If the model only learns syntax and semantics, but fails in pragmatic or common sense reasoning, then you may not be able to obtain trustworthy AGI at all

"Moore's Law" is not as effective as It has taken us so far and so fast as initially expected, because it is not the law of cause and effect in the universe that will always hold true.

Maximizing scale is just an interesting assumption. It will not allow us to reach general artificial intelligence. For example, solving the above three problems will force us to make a paradigm shift.

Netizen Frank van der Velde said that followers who maximize scale tend to use vague terms such as "big" and "more".

The training data used by deep learning models is too large compared to the training data used by humans in learning languages.

But compared with the real semantic collection of human language, these so-called massive data are still insignificant. It would take about 10 billion people to generate a sentence per second, and it would last 300 years to obtain such a large-scale training set.

Netizen Rebel Science even bluntly said that maximizing scale is not an interesting hypothesis, but a stupid hypothesis. It will not only lose on the AI track, but also die ugly.

Maximizing scale is too extreme

Raphaël Millière, a lecturer in the Department of Philosophy at Columbia University and a Ph.D. at Oxford University, also expressed some of his own opinions when the battle over "maximizing scale" was at its fiercest.

Maximizing scale, once seen as a catch-all for critics of deep learning (such as Gary Marcus), is now at loggerheads with industry insiders such as Nando de Freitas and Alex Dimakis joining the debate.

The responses from practitioners are mostly mixed, but not too negative. At the same time, the forecast date for AGI implementation on the forecasting platform Metaculus has been advanced to a historical low (May 2028), which may also increase the largest scale. ation credibility.

People's growing trust in "scale" may be due to the release of new models, such as the success of PaLM, DALL-E 2, Flamingo and Gato, adding fuel to the fire of maximizing scale.

Sutton's "Bitter Lesson" throws out many points in the discussion about maximizing scale, but they are not completely equivalent. He believes that building human knowledge into artificial intelligence models (for example, feature engineering) The efficiency is lower than using data and computing to learn.

Can the model be directly connected to AGI as long as it gets bigger? Marcus bombarded again: Three crises have emerged!

Article link: http://www.incompleteideas.net/IncIdeas/BitterLesson.html

While not without controversy, Sutton’s point seems obvious Not as radical as maximizing scale.

It does emphasize the importance of scale, but it does not reduce every problem in artificial intelligence research to a mere challenge of scale.

In fact, it is difficult to determine the specific meaning of maximizing scale. Literally understood, "Scaling is all you need" indicates that we do not need any algorithm innovation or architectural changes to achieve AGI and can expand existing model and force input of more data.

This literal explanation seems absurd: even models like Palm, DALL-E 2, Flamingo or Gato still require architectural changes from previous approaches.

It would be really surprising if anyone really thought we could extend an off-the-shelf autoregressive Transformer to AGI.

It’s unclear how much algorithmic innovation people who believe in maximizing scale feel AGI requires, which makes it difficult to generate falsifiable predictions from this perspective.

Scaling may be a necessary condition for building any system that deserves the label "general artificial intelligence," but we shouldn't mistake necessity for a sufficient condition.

The above is the detailed content of Can the model be directly connected to AGI as long as it 'gets bigger'? Marcus bombarded again: Three crises have emerged!. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Assassin's Creed Shadows: Seashell Riddle Solution

4 weeks ago By DDD

What's New in Windows 11 KB5054979 & How to Fix Update Issues

3 weeks ago By DDD

Where to find the Crane Control Keycard in Atomfall

4 weeks ago By DDD

Roblox: Dead Rails - How To Complete Every Challenge

1 months ago By DDD

Atomfall guide: item locations, quest guides, and tips

1 months ago By DDD

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Where is the login entrance for gmail email?

7712

Java Tutorial

1640

CakePHP Tutorial

1394

Laravel Tutorial

1288

PHP Tutorial

1232

Related knowledge

Bytedance Cutting launches SVIP super membership: 499 yuan for continuous annual subscription, providing a variety of AI functions Jun 28, 2024 am 03:51 AM

This site reported on June 27 that Jianying is a video editing software developed by FaceMeng Technology, a subsidiary of ByteDance. It relies on the Douyin platform and basically produces short video content for users of the platform. It is compatible with iOS, Android, and Windows. , MacOS and other operating systems. Jianying officially announced the upgrade of its membership system and launched a new SVIP, which includes a variety of AI black technologies, such as intelligent translation, intelligent highlighting, intelligent packaging, digital human synthesis, etc. In terms of price, the monthly fee for clipping SVIP is 79 yuan, the annual fee is 599 yuan (note on this site: equivalent to 49.9 yuan per month), the continuous monthly subscription is 59 yuan per month, and the continuous annual subscription is 499 yuan per year (equivalent to 41.6 yuan per month) . In addition, the cut official also stated that in order to improve the user experience, those who have subscribed to the original VIP

Context-augmented AI coding assistant using Rag and Sem-Rag Jun 10, 2024 am 11:08 AM

Improve developer productivity, efficiency, and accuracy by incorporating retrieval-enhanced generation and semantic memory into AI coding assistants. Translated from EnhancingAICodingAssistantswithContextUsingRAGandSEM-RAG, author JanakiramMSV. While basic AI programming assistants are naturally helpful, they often fail to provide the most relevant and correct code suggestions because they rely on a general understanding of the software language and the most common patterns of writing software. The code generated by these coding assistants is suitable for solving the problems they are responsible for solving, but often does not conform to the coding standards, conventions and styles of the individual teams. This often results in suggestions that need to be modified or refined in order for the code to be accepted into the application

Can fine-tuning really allow LLM to learn new things: introducing new knowledge may make the model produce more hallucinations Jun 11, 2024 pm 03:57 PM

Large Language Models (LLMs) are trained on huge text databases, where they acquire large amounts of real-world knowledge. This knowledge is embedded into their parameters and can then be used when needed. The knowledge of these models is "reified" at the end of training. At the end of pre-training, the model actually stops learning. Align or fine-tune the model to learn how to leverage this knowledge and respond more naturally to user questions. But sometimes model knowledge is not enough, and although the model can access external content through RAG, it is considered beneficial to adapt the model to new domains through fine-tuning. This fine-tuning is performed using input from human annotators or other LLM creations, where the model encounters additional real-world knowledge and integrates it

No OpenAI data required, join the list of large code models! UIUC releases StarCoder-15B-Instruct Jun 13, 2024 pm 01:59 PM

At the forefront of software technology, UIUC Zhang Lingming's group, together with researchers from the BigCode organization, recently announced the StarCoder2-15B-Instruct large code model. This innovative achievement achieved a significant breakthrough in code generation tasks, successfully surpassing CodeLlama-70B-Instruct and reaching the top of the code generation performance list. The unique feature of StarCoder2-15B-Instruct is its pure self-alignment strategy. The entire training process is open, transparent, and completely autonomous and controllable. The model generates thousands of instructions via StarCoder2-15B in response to fine-tuning the StarCoder-15B base model without relying on expensive manual annotation.

To provide a new scientific and complex question answering benchmark and evaluation system for large models, UNSW, Argonne, University of Chicago and other institutions jointly launched the SciQAG framework Jul 25, 2024 am 06:42 AM

Editor |ScienceAI Question Answering (QA) data set plays a vital role in promoting natural language processing (NLP) research. High-quality QA data sets can not only be used to fine-tune models, but also effectively evaluate the capabilities of large language models (LLM), especially the ability to understand and reason about scientific knowledge. Although there are currently many scientific QA data sets covering medicine, chemistry, biology and other fields, these data sets still have some shortcomings. First, the data form is relatively simple, most of which are multiple-choice questions. They are easy to evaluate, but limit the model's answer selection range and cannot fully test the model's ability to answer scientific questions. In contrast, open-ended Q&A

Yolov10: Detailed explanation, deployment and application all in one place! Jun 07, 2024 pm 12:05 PM

1. Introduction Over the past few years, YOLOs have become the dominant paradigm in the field of real-time object detection due to its effective balance between computational cost and detection performance. Researchers have explored YOLO's architectural design, optimization goals, data expansion strategies, etc., and have made significant progress. At the same time, relying on non-maximum suppression (NMS) for post-processing hinders end-to-end deployment of YOLO and adversely affects inference latency. In YOLOs, the design of various components lacks comprehensive and thorough inspection, resulting in significant computational redundancy and limiting the capabilities of the model. It offers suboptimal efficiency, and relatively large potential for performance improvement. In this work, the goal is to further improve the performance efficiency boundary of YOLO from both post-processing and model architecture. to this end

SK Hynix will display new AI-related products on August 6: 12-layer HBM3E, 321-high NAND, etc. Aug 01, 2024 pm 09:40 PM

According to news from this site on August 1, SK Hynix released a blog post today (August 1), announcing that it will attend the Global Semiconductor Memory Summit FMS2024 to be held in Santa Clara, California, USA from August 6 to 8, showcasing many new technologies. generation product. Introduction to the Future Memory and Storage Summit (FutureMemoryandStorage), formerly the Flash Memory Summit (FlashMemorySummit) mainly for NAND suppliers, in the context of increasing attention to artificial intelligence technology, this year was renamed the Future Memory and Storage Summit (FutureMemoryandStorage) to invite DRAM and storage vendors and many more players. New product SK hynix launched last year

SOTA performance, Xiamen multi-modal protein-ligand affinity prediction AI method, combines molecular surface information for the first time Jul 17, 2024 pm 06:37 PM

Editor | KX In the field of drug research and development, accurately and effectively predicting the binding affinity of proteins and ligands is crucial for drug screening and optimization. However, current studies do not take into account the important role of molecular surface information in protein-ligand interactions. Based on this, researchers from Xiamen University proposed a novel multi-modal feature extraction (MFE) framework, which for the first time combines information on protein surface, 3D structure and sequence, and uses a cross-attention mechanism to compare different modalities. feature alignment. Experimental results demonstrate that this method achieves state-of-the-art performance in predicting protein-ligand binding affinities. Furthermore, ablation studies demonstrate the effectiveness and necessity of protein surface information and multimodal feature alignment within this framework. Related research begins with "S

See all articles