


DeepMind CEO: LLM+tree search is the AGI technology line. AI research relies on engineering capabilities. Closed-source models are safer than open-source models.
Google suddenly switched to 996 mode after February, launching 5 models in less than a month.
And DeepMind CEO Hassabis himself has also been promoting his own products everywhere, exposing a lot of behind-the-scenes development insider information.
In his view, although technological breakthroughs are still needed, the road to AGI has now emerged for mankind.
The merger of DeepMind and Google Brain marks that the development of AI technology has entered a new era.
Q: DeepMind has always been at the forefront of technology. For example, in a system like AlphaZero, the internal intelligent agent can achieve the final goal through a series of thoughts. Does this mean that large language models (LLM) can also join the ranks of this kind of research?
Hassabis believes that large-scale models have huge potential and need to be further optimized to improve their prediction accuracy and thereby build more reliable models of the world. While this step is crucial, it may not be enough to build a complete artificial general intelligence (AGI) system.
On this basis, we are developing a planning mechanism similar to AlphaZero to formulate plans to achieve specific world goals through the world model.
This includes stringing together different chains of thinking or reasoning, or using tree searches to explore a vast space of possibilities.
These are the missing links in our current large-scale model.
Q: Starting from pure reinforcement learning (RL) methods, is it possible to move directly to AGI?
#It seems that large language models will form the basis of prior knowledge, and then further research can be carried out on this basis.
Theoretically, it is possible to completely adopt the method of developing AlphaZero.
Some people in DeepMind and the RL community are working in this direction. They start from scratch and do not rely on any prior knowledge or data to completely build a new knowledge system.
I believe that leveraging existing world knowledge - such as information on the web and data we already collect - will be the fastest way to achieve AGI.
We now have scalable algorithms that can absorb this information - Transformers. We can completely use these existing models as prior knowledge for prediction and learning.
Therefore, I believe that the final AGI system will definitely include today's large models as part of the solution.
But having a large model alone is not enough, we also need to add more planning and search capabilities to it.
Q: Faced with the huge computing resources required by these methods, how can we break through?
Even a system like AlphaGo is quite expensive due to the need to perform calculations on each node of the decision tree.
We are committed to developing sample-efficient methods and strategies for reusing existing data, such as experience replay, as well as exploring more efficient methods.
In fact, if the world model is good enough, your search can be more efficient.
Take Alpha Zero as an example. Its performance in games such as Go and chess exceeds the world championship level, but its search range is much smaller than traditional brute force search methods.
This shows that improving the model can make searches more efficient and thus reach further targets.
But when defining the reward function and goal, how to ensure that the system develops in the right direction will be one of the challenges we face.
Why can Google produce 5 models in half a month?
Q: Can you talk about why Google and DeepMind are working on so many different models at the same time?
Because we have been conducting basic research, we have a large amount of basic research work covering a variety of different innovations and directions.
This means that while we are building the main model track, the core Gemini model, there are also many more exploratory projects underway.
When these exploration projects achieve some results, we will merge them into the main branch into the next version of Gemini, which is why you will see 1.5 released immediately after 1.0, Because we're already working on the next version, and because we have multiple teams working on different timescales, cycling between each other, that's how we can continue to progress.
I hope this will become our new normal, releasing products at this high speed, but of course, but also being very responsible, keeping in mind that releasing safe models is our number one priority.
Q: I wanted to ask about your most recent big release, Gemini 1.5 Pro, your new Gemini Pro 1.5 model can handle up to one million tokens. Can you explain what this means and why context window is an important technical indicator?
Yes, this is very important. The long context can be thought of as the working memory of the model, i.e. how much data it can remember and process at one time.
The longer the context you have, the accuracy of it is also important, the accuracy of recalling things from the long context is equally important, the more data you can take into account and context.
So a million means you can handle huge books, full movies, huge amounts of audio content, like full code bases.
If you have a shorter context window, such as only one hundred thousand levels, then you can only process fragments of it, and the model cannot reason about the entire corpus that you are interested in. or search.
So this actually opens up possibilities for all kinds of new use cases that can't be done with a small context.
Q: I've heard from AI researchers that the problem with these large context windows is that they are very computationally intensive. For example, if you uploaded an entire movie or a biology textbook and asked questions about it, it would require more processing power to process all of that and respond. If a lot of people do this, the costs can add up quickly. Did Google DeepMind come up with some clever innovation to make these huge context windows more efficient, or did Google just bear the cost of all this extra computation?
Yes, this is a completely new innovation because without innovation you cannot have such a long context.
But this still requires a high computational cost, so we are working hard to optimize it.
If you fill up the entire context window. Initial processing of uploaded data may take several minutes.
But that’s not too bad if you consider that it’s like watching an entire movie or reading the entire War and Peace in a minute or two, and then you can answer any questions about it.
Then what we want to make sure is that once you upload and process a document, video, or audio, subsequent questions and answers should be faster.
That's what we're working on right now and we're very confident that we can get it down to a matter of seconds.
Q: You said you have tested the system with up to 10 million tokens. What is the effect?
worked very well in our tests. Because the computing cost is still relatively high, the service is not currently available.
But in terms of accuracy and recall, it performs very well.
Q: I want to ask you about Gemini. What special things can Gemini do that previous Google language models or other models couldn't do?
Well, I think what's exciting about Gemini, especially version 1.5, is that it's inherently multimodal and we built it from the ground up to be able to handle anything Types of input: text, image, code, video.
If you combine it with long context, you can see its potential. For example, you can imagine that you are listening to an entire lecture, or that there is an important concept you want to understand and you want to fast forward to there.
So now we can put the entire code base into a context window, which is very useful for new programmers getting started. Let's say you're a new engineer starting work on Monday. Typically you have hundreds of thousands of lines of code to look at. How do you access a function?
You need to ask the experts on the code base. But now you can actually use Gemini as a coding assistant, in this fun way. It will return some summary telling you where the important parts of the code are, and you can start working.
I think having this ability is very helpful and makes your daily workflow more efficient.
I'm really looking forward to seeing how Gemini performs when integrated into something like slack, and your general workflow. What will the workflow of the future look like? I think we're just starting to experience the changes.
Google’s top priority for open source is security
Q: I’d like to turn now to Gemma, a series of lightweight open source models you just released. Today, whether to release underlying models through open source, or keep them closed, seems to be one of the most controversial topics. Until now, Google has kept its underlying model closed source. Why choose open source now? What do you think of the criticism that making underlying models available through open source increases the risk and likelihood that they will be used by malicious actors?
Yes, I have actually discussed this issue publicly many times.
One of the main concerns is that open source and open research in general are clearly beneficial. But there is a specific problem here, and that is related to AGI and AI technologies, because they are universal.
Once you publish them, malicious actors can use them for harmful purposes.
Of course, once you open source something, you have no real way to take it back. Unlike things like API access, if you find something downstream that no one has considered before For harmful use cases, you can simply cut off access.
I think this means the bar for security, robustness and accountability is even higher. As we get closer to AGIs, they will have more powerful capabilities, so we have to be more careful about what they might be used for by malicious actors.
I have yet to hear a good argument from those who support open source, such as the open source extremists, many of whom are colleagues I respect in academia who How do you answer this question, which is consistent with protecting against open source models that would allow more malicious actors to access the model?
We need to think more about these issues as these systems become more powerful.
Q: So, why didn’t Gemma worry you about this issue?
Yes, of course, as you will notice, Gemma only offers Lightweight versions, so they are relatively small.
Actually, the smaller size is more useful for developers because usually individual developers, academics or small teams want to work quickly on their laptops, so they are made for that Optimized.
Because they are not cutting edge models, they are small models and we feel reassured because the capabilities of these models have been rigorously tested and we know very well what they are capable of for a model of this size There are no big risks.
Why DeepMind merged with Google Brain
Q: Last year, when Google Brain and DeepMind merged, some people I know in the AI industry felt Worry. They worry that Google has historically given DeepMind considerable latitude to work on various research projects it deems important.
With the merger, DeepMind may have to be redirected to things that are beneficial to Google in the short term, rather than these larger Long-term basic research projects. It's been a year since the merger, has this tension between short-term interest in Google and possible long-term AI advancements changed what you can work on?
Yes, everything was very good this first year as you mentioned. One reason is that we think now is the right time, and I think it's the right time from a researcher's perspective.
Maybe let's go back five or six years, when we were doing things like AlphaGo, in the field of AI, we had been exploratory research on how to reach AGI, what breakthroughs were needed, what should be bets on, And in that case, you want to do a broad set of things, so I think that's a very exploratory stage.
I think over the last two or three years it has become clear what the main components of AGI will be, as I mentioned before, although we still need new innovations.
I think you just saw the long context of Gemini1.5 and I think there are a lot of new innovations like that that are going to be required, so the basics Research remains as important as ever.
But now we also need to work in the engineering direction, which is to expand and utilize known technologies and push them to their limits. It requires very creative engineering at scale, from prototypes to level of hardware to data center scale, and the efficiency issues involved.
Another reason is that if you were manufacturing some AI-driven products five or six years ago, you would have had to build an AI that was completely different from the AGI research track.
It can only perform tasks in special scenarios for specific products. It is a kind of customized AI, "hand-made AI".
But the situation is different today. To do AI for products, the best way now is to use general AI technologies and systems because they have reached sufficient levels of complexity and capability.
So actually this is a convergence point, so you can now see that the research track and the product track have been merged together.
For example, we are now going to make an AI voice assistant. The opposite is a chatbot that truly understands language. They are now integrated, so there is no need to consider that dichotomy now. Or coordinate a tense relationship.
The second reason is that having a tight feedback loop between research and real-world application is actually very beneficial to research.
Because of the way products allow you to really understand how your model performs, you can have academic metrics, but the real test is when millions of users use your product, they Do you find it useful, do you find it helpful, do you find it beneficial to the world.
You're obviously going to get a lot of feedback, and that will then lead to very rapid improvements to the underlying model, so I think we're in this very, very exciting stage right now.
The above is the detailed content of DeepMind CEO: LLM+tree search is the AGI technology line. AI research relies on engineering capabilities. Closed-source models are safer than open-source models.. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics



DDREASE is a tool for recovering data from file or block devices such as hard drives, SSDs, RAM disks, CDs, DVDs and USB storage devices. It copies data from one block device to another, leaving corrupted data blocks behind and moving only good data blocks. ddreasue is a powerful recovery tool that is fully automated as it does not require any interference during recovery operations. Additionally, thanks to the ddasue map file, it can be stopped and resumed at any time. Other key features of DDREASE are as follows: It does not overwrite recovered data but fills the gaps in case of iterative recovery. However, it can be truncated if the tool is instructed to do so explicitly. Recover data from multiple files or blocks to a single

0.What does this article do? We propose DepthFM: a versatile and fast state-of-the-art generative monocular depth estimation model. In addition to traditional depth estimation tasks, DepthFM also demonstrates state-of-the-art capabilities in downstream tasks such as depth inpainting. DepthFM is efficient and can synthesize depth maps within a few inference steps. Let’s read about this work together ~ 1. Paper information title: DepthFM: FastMonocularDepthEstimationwithFlowMatching Author: MingGui, JohannesS.Fischer, UlrichPrestel, PingchuanMa, Dmytr

The performance of JAX, promoted by Google, has surpassed that of Pytorch and TensorFlow in recent benchmark tests, ranking first in 7 indicators. And the test was not done on the TPU with the best JAX performance. Although among developers, Pytorch is still more popular than Tensorflow. But in the future, perhaps more large models will be trained and run based on the JAX platform. Models Recently, the Keras team benchmarked three backends (TensorFlow, JAX, PyTorch) with the native PyTorch implementation and Keras2 with TensorFlow. First, they select a set of mainstream

Facing lag, slow mobile data connection on iPhone? Typically, the strength of cellular internet on your phone depends on several factors such as region, cellular network type, roaming type, etc. There are some things you can do to get a faster, more reliable cellular Internet connection. Fix 1 – Force Restart iPhone Sometimes, force restarting your device just resets a lot of things, including the cellular connection. Step 1 – Just press the volume up key once and release. Next, press the Volume Down key and release it again. Step 2 – The next part of the process is to hold the button on the right side. Let the iPhone finish restarting. Enable cellular data and check network speed. Check again Fix 2 – Change data mode While 5G offers better network speeds, it works better when the signal is weaker

Boston Dynamics Atlas officially enters the era of electric robots! Yesterday, the hydraulic Atlas just "tearfully" withdrew from the stage of history. Today, Boston Dynamics announced that the electric Atlas is on the job. It seems that in the field of commercial humanoid robots, Boston Dynamics is determined to compete with Tesla. After the new video was released, it had already been viewed by more than one million people in just ten hours. The old people leave and new roles appear. This is a historical necessity. There is no doubt that this year is the explosive year of humanoid robots. Netizens commented: The advancement of robots has made this year's opening ceremony look like a human, and the degree of freedom is far greater than that of humans. But is this really not a horror movie? At the beginning of the video, Atlas is lying calmly on the ground, seemingly on his back. What follows is jaw-dropping

I cry to death. The world is madly building big models. The data on the Internet is not enough. It is not enough at all. The training model looks like "The Hunger Games", and AI researchers around the world are worrying about how to feed these data voracious eaters. This problem is particularly prominent in multi-modal tasks. At a time when nothing could be done, a start-up team from the Department of Renmin University of China used its own new model to become the first in China to make "model-generated data feed itself" a reality. Moreover, it is a two-pronged approach on the understanding side and the generation side. Both sides can generate high-quality, multi-modal new data and provide data feedback to the model itself. What is a model? Awaker 1.0, a large multi-modal model that just appeared on the Zhongguancun Forum. Who is the team? Sophon engine. Founded by Gao Yizhao, a doctoral student at Renmin University’s Hillhouse School of Artificial Intelligence.

What? Is Zootopia brought into reality by domestic AI? Exposed together with the video is a new large-scale domestic video generation model called "Keling". Sora uses a similar technical route and combines a number of self-developed technological innovations to produce videos that not only have large and reasonable movements, but also simulate the characteristics of the physical world and have strong conceptual combination capabilities and imagination. According to the data, Keling supports the generation of ultra-long videos of up to 2 minutes at 30fps, with resolutions up to 1080p, and supports multiple aspect ratios. Another important point is that Keling is not a demo or video result demonstration released by the laboratory, but a product-level application launched by Kuaishou, a leading player in the short video field. Moreover, the main focus is to be pragmatic, not to write blank checks, and to go online as soon as it is released. The large model of Ke Ling is already available in Kuaiying.

The latest video of Tesla's robot Optimus is released, and it can already work in the factory. At normal speed, it sorts batteries (Tesla's 4680 batteries) like this: The official also released what it looks like at 20x speed - on a small "workstation", picking and picking and picking: This time it is released One of the highlights of the video is that Optimus completes this work in the factory, completely autonomously, without human intervention throughout the process. And from the perspective of Optimus, it can also pick up and place the crooked battery, focusing on automatic error correction: Regarding Optimus's hand, NVIDIA scientist Jim Fan gave a high evaluation: Optimus's hand is the world's five-fingered robot. One of the most dexterous. Its hands are not only tactile
