


Li Feifei takes stock of the top ten AI highlights of the year: nuclear fusion, ChatGPT, and AlphaFold are on the list
The explosion of artificial intelligence is distorting our sense of time.
Can you believe that Stable Diffusion is only 4 months old and ChatGPT has been around for less than a month?
To use a vivid metaphor, as long as you blink, you will miss a brand new industry.
In the AI field in 2022, large-scale generative models have sprung up like mushrooms after a rain, changing the landscape of the entire AI industry.
Moreover, these models are rapidly moving out of the laboratory and being applied in reality.
For example, LLM technology has inspired two emerging fields-decision-making agents (games, robots, etc.) and AI4Science.
Jim Fan, a disciple of Li Feifei, summarized the top ten AI highlight moments in 2022 for us. Let’s turn back the clock and see what amazing AI breakthroughs there will be in 2022.
1. Text-image generation
DALLE-2 is the first to generate realistic high-resolution images from any title Large-scale diffusion models for images.
It launched an artistic revolution in AI, spawning many new applications, startups, and ways of thinking.
But DALLE-2 is protected behind the walls of OpenAI and is not open source.
After OpenAI, LMU's StabilityAI and runwayml took a heroic step and trained their own Internet-scale text2image model based on the "potential diffusion" algorithm. They call the model "stable diffusion" and open source the code and weights.
Facts have proved that the openness of Stable Diffusion has brought great changes to the game.
Now, many startups and research labs are creating new applications based on Stable Diffusion, and Stable Diffusion itself is continuously improved by the open source community.
Recently, Stable Diffusion has reached v2.1 and can run on a single GPU.
In addition, there are two image2text models from GoogleAI this year. GoogleAI has neither released the model nor the API, but from the paper, we can still see many interesting insights.
Imagen
https://imagen.research.google
Parti
https://parti.research.google. It is a Transformer model without diffusion.
2. Text-text generation
As everyone knows, I am talking about ChatGPT!
This is the only app in history to gain 1 million users in 5 days.
ChatGPT has also greatly inspired our human creativity.
In this list, you can see all useful and imaginative ideas about ChatGPT: https://github.com/f/awesome-chat
Both ChatGPT and GPT-3.5 use a new technology called RLHF ("Reinforcement Learning from Human Feedback").
This also means that the prompt project may disappear soon.
The popularity of ChatGPT has spawned a wave of new startups and competitors, such as Jasper Chat, YouChat, Replit’s Ghostwriter chat, and perplexity_ai.
These competitors provide such intuitive search methods that even Google executives are starting to sweat!
3. Text-Robot Model
How to give GPT arms and legs so they can clean your messy kitchen?
Unlike NLP, robot models need to interact with the physical world.
This year, the large pre-trained Transformer finally began to solve the most difficult problems in the field of robotics!
VIMA
In October, my colleagues and I Created a "robot GPT" - a transformer named VIMA.
It can receive any mixed text, images and videos as prompts and output the control of the robot arm.
Our model is called VIMA ("VisuoMotor Attention") and is completely open source.
Now, a single agent can solve visual targets, one-time imitation of videos, new concept foundations, visual constraints, etc., with strong scalability of model capacity and data.
RT-1
Following a similar path to VIMA, researchers from GoogleAI released RT-1, a Robot transformer trained on 700 tasks and 130K human demonstrations.
This data was collected over 17 months by 13 robots, a literal army of steel!
4. Text-Video
Essentially, a video is a series of images tied together over time, giving us Creates the illusion of movement.
If we can do text2image, why not add a timeline to it for some extra fun?
Currently, there are three major works in the text-to-video field, but none of them are open source.
Make-A-Video
The first is Meta AI’s Make-A-Video: No need for paired text-video data, you can get text-video of generation.
You can sign up for trial access here: https://makeavevideo.studio
Paper link: https://arxiv.org/abs /2209.14792
Phenaki
Phenaki from Google AI: Generating variable-length videos from open-domain text descriptions. Demo: https://phenaki.videoDreamFusion
The first to appear is DreamFusion jointly developed by the Google AI research team and UC Berkeley.Magic3D
The second result is two projects of the NVIDIA AI team, named GET3D and Magic3D.
Point-E
After the DALL-E 2 launched at the beginning of the year surprised everyone with its genius brush, OpenAI released its latest image generation model "POINT-E" on Tuesday , which can generate 3D models directly from text.So, can AI use its imagination as much as humans can?
Jim Fan and colleagues collaborated to develop the first AI to play "Minecraft", "MineDojo", which can solve many tasks under natural language prompts.
Paper link: https://arxiv.org/pdf/2206.08853.pdf
Fan’s ultimate goal is to build an “embodied ChatGPT” . Currently, the MineDojo platform is completely open source.
At the same time, Jeff Clune’s team announced a model called Video Pre-Training (VPT), which can directly output keyboard and mouse movements.
Paper link: https://arxiv.org/pdf/2206.11795.pdf
VPT has a broader perspective, But it is not restricted by language conditions. At this point, MineDojo and VPT complement each other.
References:
https://twitter.com/drjimfan/status/1607746957753057280?s=46&t=OVM_4zdRW2rQwqLohMdPpw
The above is the detailed content of Li Feifei takes stock of the top ten AI highlights of the year: nuclear fusion, ChatGPT, and AlphaFold are on the list. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics

How to output a countdown in C? Answer: Use loop statements. Steps: 1. Define the variable n and store the countdown number to output; 2. Use the while loop to continuously print n until n is less than 1; 3. In the loop body, print out the value of n; 4. At the end of the loop, subtract n by 1 to output the next smaller reciprocal.

A C language function consists of a parameter list, function body, return value type and function name. When a function is called, the parameters are copied to the function through the value transfer mechanism, and will not affect external variables. Pointer passes directly to the memory address, modifying the pointer will affect external variables. Function prototype declaration is used to inform the compiler of function signatures to avoid compilation errors. Stack space is used to store function local variables and parameters. Too much recursion or too much space can cause stack overflow.

How to achieve the playback of pictures like videos? Many times, we need to implement similar video player functions, but the playback content is a sequence of images. direct...

Algorithms are the set of instructions to solve problems, and their execution speed and memory usage vary. In programming, many algorithms are based on data search and sorting. This article will introduce several data retrieval and sorting algorithms. Linear search assumes that there is an array [20,500,10,5,100,1,50] and needs to find the number 50. The linear search algorithm checks each element in the array one by one until the target value is found or the complete array is traversed. The algorithm flowchart is as follows: The pseudo-code for linear search is as follows: Check each element: If the target value is found: Return true Return false C language implementation: #include#includeintmain(void){i

Integers are the most basic data type in programming and can be regarded as the cornerstone of programming. The job of a programmer is to give these numbers meanings. No matter how complex the software is, it ultimately comes down to integer operations, because the processor only understands integers. To represent negative numbers, we introduced two's complement; to represent decimal numbers, we created scientific notation, so there are floating-point numbers. But in the final analysis, everything is still inseparable from 0 and 1. A brief history of integers In C, int is almost the default type. Although the compiler may issue a warning, in many cases you can still write code like this: main(void){return0;} From a technical point of view, this is equivalent to the following code: intmain(void){return0;}

C language functions include definitions, calls and declarations. Function definition specifies function name, parameters and return type, function body implements functions; function calls execute functions and provide parameters; function declarations inform the compiler of function type. Value pass is used for parameter pass, pay attention to the return type, maintain a consistent code style, and handle errors in functions. Mastering this knowledge can help write elegant, robust C code.

Data update problems in zustand asynchronous operations. When using the zustand state management library, you often encounter the problem of data updates that cause asynchronous operations to be untimely. �...

A solution to implement text annotation nesting in Quill Editor. When using Quill Editor for text annotation, we often need to use the Quill Editor to...
