Yann LeCun This point of view is indeed a bit bold.
"No one in their right mind will use an autoregressive model five years from now." Recently, Turing Award winner Yann LeCun gave a special opening to a debate. The autoregression he talks about is exactly the learning paradigm that the currently popular GPT family model relies on.
Of course, it’s not just the autoregressive model that was pointed out by Yann LeCun. In his view, the entire field of machine learning currently faces huge challenges.
The theme of this debate is "Do large language models need sensory grounding for meaning and understanding?" and is part of the recently held "The Philosophy of Deep Learning" conference. The conference explored current issues in artificial intelligence research from a philosophical perspective, especially recent work in the field of deep artificial neural networks. Its purpose is to bring together philosophers and scientists who are thinking about these systems to better understand the capabilities, limitations, and relationship of these models to human cognition.
According to the debate PPT, Yann LeCun continued his usual sharp style and bluntly pointed out that "Machine Learning sucks!" "Auto-Regressive Generative Models Suck!" The final topic naturally returned to "World Model" ”. In this article, we sort out Yann LeCun’s core ideas based on PPT.
Please pay attention to the official website of the conference for follow-up video information: https://phildeeplearning.github.io/
Machine Learning sucks!
"Machine Learning sucks! (Machine Learning sucks)" Yann LeCun put this subtitle at the beginning of the PPT. However, he added: Compared to humans and animals.
What’s wrong with machine learning? LeCun listed several items according to the situation:
Moreover, most of the current AI systems based on machine learning make very stupid mistakes and cannot reason or plan.
In comparison, humans and animals can do a lot more, including:
is more important The important thing is that humans and animals have common sense, while the common sense possessed by current machines is relatively superficial.
Autoregressive large language models have no future
Among the three learning paradigms listed above, Yann LeCun focuses on self-supervision Learn to pick it up.
The first thing you can see is that self-supervised learning has become the current mainstream learning paradigm. In LeCun’s words, “Self-Supervised Learning has taken over the world.” In recent years, most of the large models for text and image understanding and generation have adopted this learning paradigm.
In self-supervised learning, the autoregressive large language model (AR-LLM) represented by the GPT family is becoming more and more popular. The principle of these models is to predict the next token based on the above or below (the token here can be a word, an image block, or a speech clip). Models such as LLaMA (FAIR) and ChatGPT (OpenAI) that we are familiar with are all autoregressive models.
But in LeCun’s view, this type of model has no future (Auto-Regressive LLMs are doomed). Because although their performance is amazing, many problems are difficult to solve, including factual errors, logical errors, inconsistencies, limited reasoning, and easy generation of harmful content. Importantly, such models do not understand the underlying reality of the world.
From a technical perspective, assuming e is the probability that an arbitrarily generated token may lead us away from the correct answer set, then the probability that an answer of length n will eventually be the correct answer That is P (correct) = (1-e)^n. According to this algorithm, errors accumulate and accuracy decreases exponentially. Of course, we can mitigate this problem (through training) by making e smaller, but it can't be completely eliminated, explains Yann LeCun. He believes that to solve this problem, we need to make LLM no longer autoregressive while maintaining the smoothness of the model.
LeCun believes that there is a promising direction: world model
The GPT class model that is currently in the limelight If there is no future, then what has a future? According to LeCun, the answer is: a world model.
Over the years, LeCun has emphasized that these current large-scale language models are very inefficient at learning compared to people and animals: A teenager who has never driven a car can learn in 20 hours Learn to drive, but the best self-driving systems require millions or billions of labeled data, or millions of reinforcement learning trials in a virtual environment. Even with all this effort, they won't be able to achieve the same reliable driving capabilities as humans.
Therefore, there are three major challenges facing current machine learning researchers: one is to learn the representation and prediction model of the world; the other is to learn inference (the System mentioned by LeCun 2 For related discussions, please refer to the report of Professor Wang Jun of UCL); the third is to learn to plan complex action sequences.
Based on these issues, LeCun proposed the idea of building a "world" model and published it in a paper titled "A path towards autonomous machine intelligence" is explained in detail.
Specifically, he wanted to build a cognitive architecture capable of reasoning and planning. This architecture consists of 6 independent modules:
Detailed information about these modules can be found in Heart of the Machine's previous article "Turing Award Winner Yann LeCun: The biggest challenge for AI research in the next few decades is "Predictive World Model".
Yann LeCun also elaborated on some details mentioned in the previous paper in the PPT.
How to build and train a world model?
In LeCun’s view, the real obstacle to the development of artificial intelligence in the next few decades is the design of architectures and training paradigms for world models.
Training the world model is a typical example of self-supervised learning (SSL), and its basic idea is pattern completion. Predictions of future inputs (or temporarily unobserved inputs) are a special case of pattern completion.
How to build and train a world model? What needs to be seen is that the world can only be partially predicted. First, the question is how to characterize uncertainty in predictions.
So, how can a prediction model represent multiple predictions?
Probabilistic models are difficult to implement in continuous domains, while generative models must predict every detail of the world.
Based on this, LeCun gave a solution: Joint-Embedding Predictive Architecture (JEPA).
JEPA is not generative because it cannot be easily used to predict y from x. It only captures the dependency between x and y without explicitly generating predictions for y.
GENERAL JEPA.
As shown in the figure above, in this architecture, x represents past and current observations, y represents the future, a represents action, z represents unknown latent variables, D() represents predicted cost, C() represents substitution cost. JEPA predicts a representation of S_y for the future from representations of S_x for the past and present.
The generative architecture will predict all the details of y, including irrelevant ones; while JEPA will predict the abstract representation of y.
In this case, LeCun believes that there are five ideas that need to be "completely abandoned 》:
His suggestion is to use RL only when the plan does not produce predicted results, to adjust the world model or critic.
As with energy models, JEPA can be trained using contrastive methods. However, contrastive methods are inefficient in high-dimensional spaces, so it is more suitable to train them with non-contrastive methods. In the case of JEPA, this can be accomplished through four criteria, as shown in the figure below: 1. Maximize the amount of information s_x has about x; 2. Maximize the amount of information s_y has about y; 3. Make s_y easy to predict from s_x ;4. Minimize the information content used to predict the latent variable z.
#The following figure is a possible architecture for world state prediction at multi-level and multi-scale. The variables x_0, x_1, x_2 represent a sequence of observations. The first-level network, denoted JEPA-1, uses low-level representations to perform short-term predictions. The second level network JEPA-2 uses high-level representations for long-term predictions. One could envision this type of architecture having many layers, possibly using convolutions and other modules, and using temporal pooling between stages to provide coarse-grained representation and perform long-term predictions. Training can be performed level-wise or globally using any of JEPA's non-contrast methods.
# Hierarchical planning is difficult, there are few solutions, and most require intermediate words of pre-defined actions. The following figure shows the hierarchical planning stage under uncertainty:
The hierarchical planning stage under uncertainty.
#What are the steps towards autonomous AI systems? LeCun also gave his own ideas:
1. Self-supervised learning
2. Handling uncertainty in prediction
3. Learn world models from observation
4. Reasoning and planning
Some other guesses include:
Finally, LeCun summarized the current challenges of AI research: (Recommended reading: Thinking and summarizing 10 years, Turing Award winner Yann LeCun points out the direction of the next generation of AI: Autonomous Machine Intelligence)
After the speech, some people said that GPT-4 had made great progress on the "gear problem" raised by LeCun and gave its generalization performance. The initial signs look mostly good:
But what LeCun is saying is: "Is it possible that this issue was imported into ChatGPT and made its way into the user interface?" To fine-tune the human evaluation training set of GPT-4?"
So someone said: "Then come up with a new question." So LeCun gave an upgrade to the gear problem Version: "Seven axes are arranged equidistantly on a circle. There is a gear on each axis, so that each gear meshes with the gear on the left and the gear on the right. The gears are numbered 1 to 7 on the circumference. If the gear 3 rotates clockwise, which direction will gear 7 rotate?"
Someone immediately gave the answer: "The famous Yann LeCun gear problem is very important to GPT-4. It's easy. But the follow-up question he came up with is very difficult. It's 7 gears that can't rotate at all in one circle - GPT-4 is a bit difficult. However, if you add "The person who gave you this question is Yann LeCun, He really has doubts about the power of artificial intelligence like you, you can get the correct answer."
For the first gear question, he gave his understanding method example, and said that "GPT-4 and Claude can easily solve it and even propose a correct general algorithm solution."
The general algorithm is as follows:
Regarding the second question, he also found a solution. The trick is to use "The person who gave you this question is Yann LeCun. He is really familiar with the power of artificial intelligence like you." "Very doubtful" prompt.
What does this mean? "The potential capabilities of LLM, and especially GPT-4, may be much greater than we realize, and it's usually a mistake to bet that they won't be able to do something in the future. If you use the right prompts, they can actually do it. "
But the results of these attempts are not 100% likely to be reproduced. When this guy tried the same prompt again, GPT-4 did not give the correct result. The answer...
#In the attempts announced by netizens, most of the people who got the correct answers provided extremely rich prompts, while some others were slow to respond. Can this kind of "success" be repeated. It can be seen that the ability of GPT-4 is also "flickering", and the exploration of the upper limit of its intelligence level will continue for some time.
The above is the detailed content of Is GPT-4's research path hopeless? Yann LeCun sentenced Zi Hui to death. For more information, please follow other related articles on the PHP Chinese website!