Table of Contents
Multi-modal capabilities, with
Netizen: The future of research
Home Technology peripherals AI ChatGPT can choose models by itself! Microsoft Asia Research Institute + Zhejiang University's hot new paper, the HuggingGPT project has been open source

ChatGPT can choose models by itself! Microsoft Asia Research Institute + Zhejiang University's hot new paper, the HuggingGPT project has been open source

Apr 11, 2023 am 08:41 AM
system Model

The AI ​​craze triggered by ChatGPT has also "burned" the financial circle.

Recently, researchers at Bloomberg have also developed a GPT in the financial field—Bloomberg GPT, with 50 billion parameters.

The emergence of GPT-4 has given many people a taste of the powerful capabilities of large language models.

#However, OpenAI is not open. Many people in the industry have begun to clone GPT, and many ChatGPT replacement models are built on open source models, especially the Meta open source LLMa model.

#For example, Stanford's Alpaca, UC Berkeley teamed up with CMU, Stanford and other Vicuna, Dolly of the startup Databricks, etc.

ChatGPT can choose models by itself! Microsoft Asia Research Institute + Zhejiang Universitys hot new paper, the HuggingGPT project has been open source

Various ChatGPT-like large-scale language models built for different tasks and applications present a hundred schools of thought in the entire field. potential.

So the question is, how do researchers choose an appropriate model, or even multiple models, to complete a complex task?

Recently, the research team from Microsoft Research Asia and Zhejiang University released HuggingGPT, a large model collaboration system.

ChatGPT can choose models by itself! Microsoft Asia Research Institute + Zhejiang Universitys hot new paper, the HuggingGPT project has been open source

##Paper address: https://arxiv.org/pdf/2303.17580.pdf

HuggingGPT uses ChatGPT as a controller to connect various AI models in the HuggingFace community to complete multi-modal complex tasks.

This means that you will have a kind of super magic. Through HuggingGPT, you can have multi-modal capabilities, including pictures, videos, and voices. .

HuggingGPT Bridge

Researchers pointed out that solving the current problems of large language models (LLMs) may be the first step towards AGI. It is also a critical step.

Because the current technology of large language models still has some shortcomings, there are some pressing challenges on the road to building AGI systems.

- Limited by the input and output forms of text generation, current LLMs lack the ability to process complex information (such as vision and speech);

- In actual application scenarios, some complex tasks usually consist of multiple subtasks, so the scheduling and collaboration of multiple models are required, which is also beyond the capabilities of the language model;

- For some challenging tasks, LLMs show excellent results in zero-sample or few-sample settings, but they are still weaker than some experts (such as fine-tuned models).

To handle complex AI tasks, LLMs should be able to coordinate with external models to leverage their capabilities. Therefore, the key point is how to choose the appropriate middleware to bridge LLMs and AI models.

ChatGPT can choose models by itself! Microsoft Asia Research Institute + Zhejiang Universitys hot new paper, the HuggingGPT project has been open source

Researchers found that each AI model can be expressed in a language form by summarizing its model functions.

Thus, a concept is introduced, "Language is LLMs, namely ChatGPT, a universal interface to connect artificial intelligence models."

By incorporating the AI ​​model description into the prompts, ChatGPT can be considered the brain that manages the AI ​​model. Therefore, this method allows ChatGPT to call external models to solve practical tasks.

To put it simply, HuggingGPT is a collaboration system, not a large model.

Its function is to connect ChatGPT and HuggingFace to process input in different modalities and solve many complex artificial intelligence tasks.

So, every AI model in the HuggingFace community has a corresponding model description in the HuggingGPT library and is integrated into the prompt to build a ChatGPT connection.

HuggingGPT then uses ChatGPT as the brain to determine the answer to the question.

So far, HuggingGPT has integrated hundreds of models on HuggingFace around ChatGPT, covering text classification, target detection, semantic segmentation, image generation, 24 tasks including Q&A, text-to-speech, and text-to-video.

Experimental results prove that HuggingGPT has the ability to handle multi-modal information and complex artificial intelligence tasks.

Four-step workflow

HuggingGPT entire workflow It can be divided into the following four stages:

-Task planning: ChatGPT parses user requests, breaks them into multiple tasks, and plans the task sequence based on its knowledge and dependencies

- Model selection: LLM assigns the parsed tasks to expert models based on the model description in HuggingFace

-Task execution: The expert model executes the assigned task on the inference endpoint and records the execution information and inference results into LLM

- Response generation: LLM summarizes the execution process log and inference results, and returns the summary to the user

Multi-modal capabilities, with

Experimental settings

In the experiment, the researcher used gpt-3.5-turbo and text-davinci-003 Variants of GPT models serve as Large Language Models (LLMs), which are publicly accessible through the OpenAI API.

#In order to make the output of LLM more stable, we set the decoding temperature to 0.

#At the same time, in order to adjust the output of LLM to conform to the expected format, we set logit_bias to 0.1 on the format constraint.

The researchers provide detailed tips designed for the mission planning, model selection, and reaction generation phases in the following table, where {{variable}} represents Before the prompt is entered into the LLM, the field values ​​need to be filled in with the corresponding text.

ChatGPT can choose models by itself! Microsoft Asia Research Institute + Zhejiang Universitys hot new paper, the HuggingGPT project has been open source

Researchers tested HuggingGPT on a wide range of multi-modal tasks.

With the cooperation of ChatGP and expert models, HuggingGPT can solve tasks in multiple modes such as language, image, audio and video, including detection, generation, classification and question answering. Task.

#Although these tasks may seem simple, mastering the basic capabilities of HuggingGPT is a prerequisite for solving complex tasks.

For example, visual question and answer task:

ChatGPT can choose models by itself! Microsoft Asia Research Institute + Zhejiang Universitys hot new paper, the HuggingGPT project has been open source

# #Text generation:

ChatGPT can choose models by itself! Microsoft Asia Research Institute + Zhejiang Universitys hot new paper, the HuggingGPT project has been open source

文生图:

ChatGPT can choose models by itself! Microsoft Asia Research Institute + Zhejiang Universitys hot new paper, the HuggingGPT project has been open source

HuggingGPT can integrate multiple input contents to perform simple reasoning. It can be found that even if there are multiple task resources, HuggingGPT can decompose the main task into multiple basic tasks, and finally integrate the inference results of multiple models to obtain the correct answer.

ChatGPT can choose models by itself! Microsoft Asia Research Institute + Zhejiang Universitys hot new paper, the HuggingGPT project has been open source

In addition, the researchers evaluated the effectiveness of HuggingGPT in complex task situations through tests.

# demonstrated HuggingGPT’s ability to handle multiple complex tasks.

When processing multiple requests, they may contain multiple implicit tasks or require multiple aspects of information. In this case, relying on an expert model to solve the problem is not enough.

#HuggingGPT can organize the collaboration of multiple models through task planning.

A user request may explicitly contain multiple tasks:

ChatGPT can choose models by itself! Microsoft Asia Research Institute + Zhejiang Universitys hot new paper, the HuggingGPT project has been open source

The figure below shows HuggingGPT’s ability to handle complex tasks in multi-turn dialogue scenarios.

Users divide a complex request into several steps and reach the final goal through multiple rounds of requests. It was found that HuggingGPT can track the situation status of user requests through dialogue situation management in the task planning stage, and can well solve the requested resources and task planning mentioned by users.

ChatGPT can choose models by itself! Microsoft Asia Research Institute + Zhejiang Universitys hot new paper, the HuggingGPT project has been open source

"Jarvis" open source

Currently, this project has been open sourced on GitHub. But the code has not been fully released.

ChatGPT can choose models by itself! Microsoft Asia Research Institute + Zhejiang Universitys hot new paper, the HuggingGPT project has been open source

Interestingly, the researchers named this project Jarvis in "Iron Man", the invincible AI Here it comes.

JARVIS: A system connecting LLMs and the ML community

ChatGPT can choose models by itself! Microsoft Asia Research Institute + Zhejiang Universitys hot new paper, the HuggingGPT project has been open source

By the way, HuggingGPT requires the OpenAI API to be used.

ChatGPT can choose models by itself! Microsoft Asia Research Institute + Zhejiang Universitys hot new paper, the HuggingGPT project has been open source

Netizen: The future of research

JARVIS / HuggingGPT is just like the Toolformer proposed by Meta before. They are all acting as connectors.

#Even, including ChatGPT plugins.

Netizens said, "I strongly suspect that the first artificial general intelligence (AGI) will appear earlier than expected. It will rely on "glue" artificial intelligence , able to intelligently glue together a series of narrow artificial intelligence and practical tools.

#I was given access to the plug-in, which transformed it from a math noob to a math genius overnight. Of course, this is only a small step, but it is a sign of future development trends.

ChatGPT can choose models by itself! Microsoft Asia Research Institute + Zhejiang Universitys hot new paper, the HuggingGPT project has been open source

I predict that in the next year or so we will see an AI assistant that is Dozens of large language models (LLMs) and similar tools are connected, and end users simply give instructions to their assistants to complete tasks for them. This sci-fi moment is coming.

ChatGPT can choose models by itself! Microsoft Asia Research Institute + Zhejiang Universitys hot new paper, the HuggingGPT project has been open source

Some netizens said that this is the future research method.

ChatGPT can choose models by itself! Microsoft Asia Research Institute + Zhejiang Universitys hot new paper, the HuggingGPT project has been open source

GPT In front of a lot of tools, you know how to use them.

ChatGPT can choose models by itself! Microsoft Asia Research Institute + Zhejiang Universitys hot new paper, the HuggingGPT project has been open source

The above is the detailed content of ChatGPT can choose models by itself! Microsoft Asia Research Institute + Zhejiang University's hot new paper, the HuggingGPT project has been open source. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. Best Graphic Settings
3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. How to Fix Audio if You Can't Hear Anyone
3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Huawei's Qiankun ADS3.0 intelligent driving system will be launched in August and will be launched on Xiangjie S9 for the first time Huawei's Qiankun ADS3.0 intelligent driving system will be launched in August and will be launched on Xiangjie S9 for the first time Jul 30, 2024 pm 02:17 PM

On July 29, at the roll-off ceremony of AITO Wenjie's 400,000th new car, Yu Chengdong, Huawei's Managing Director, Chairman of Terminal BG, and Chairman of Smart Car Solutions BU, attended and delivered a speech and announced that Wenjie series models will be launched this year In August, Huawei Qiankun ADS 3.0 version was launched, and it is planned to successively push upgrades from August to September. The Xiangjie S9, which will be released on August 6, will debut Huawei’s ADS3.0 intelligent driving system. With the assistance of lidar, Huawei Qiankun ADS3.0 version will greatly improve its intelligent driving capabilities, have end-to-end integrated capabilities, and adopt a new end-to-end architecture of GOD (general obstacle identification)/PDP (predictive decision-making and control) , providing the NCA function of smart driving from parking space to parking space, and upgrading CAS3.0

The world's most powerful open source MoE model is here, with Chinese capabilities comparable to GPT-4, and the price is only nearly one percent of GPT-4-Turbo The world's most powerful open source MoE model is here, with Chinese capabilities comparable to GPT-4, and the price is only nearly one percent of GPT-4-Turbo May 07, 2024 pm 04:13 PM

Imagine an artificial intelligence model that not only has the ability to surpass traditional computing, but also achieves more efficient performance at a lower cost. This is not science fiction, DeepSeek-V2[1], the world’s most powerful open source MoE model is here. DeepSeek-V2 is a powerful mixture of experts (MoE) language model with the characteristics of economical training and efficient inference. It consists of 236B parameters, 21B of which are used to activate each marker. Compared with DeepSeek67B, DeepSeek-V2 has stronger performance, while saving 42.5% of training costs, reducing KV cache by 93.3%, and increasing the maximum generation throughput to 5.76 times. DeepSeek is a company exploring general artificial intelligence

KAN, which replaces MLP, has been extended to convolution by open source projects KAN, which replaces MLP, has been extended to convolution by open source projects Jun 01, 2024 pm 10:03 PM

Earlier this month, researchers from MIT and other institutions proposed a very promising alternative to MLP - KAN. KAN outperforms MLP in terms of accuracy and interpretability. And it can outperform MLP running with a larger number of parameters with a very small number of parameters. For example, the authors stated that they used KAN to reproduce DeepMind's results with a smaller network and a higher degree of automation. Specifically, DeepMind's MLP has about 300,000 parameters, while KAN only has about 200 parameters. KAN has a strong mathematical foundation like MLP. MLP is based on the universal approximation theorem, while KAN is based on the Kolmogorov-Arnold representation theorem. As shown in the figure below, KAN has

Hello, electric Atlas! Boston Dynamics robot comes back to life, 180-degree weird moves scare Musk Hello, electric Atlas! Boston Dynamics robot comes back to life, 180-degree weird moves scare Musk Apr 18, 2024 pm 07:58 PM

Boston Dynamics Atlas officially enters the era of electric robots! Yesterday, the hydraulic Atlas just "tearfully" withdrew from the stage of history. Today, Boston Dynamics announced that the electric Atlas is on the job. It seems that in the field of commercial humanoid robots, Boston Dynamics is determined to compete with Tesla. After the new video was released, it had already been viewed by more than one million people in just ten hours. The old people leave and new roles appear. This is a historical necessity. There is no doubt that this year is the explosive year of humanoid robots. Netizens commented: The advancement of robots has made this year's opening ceremony look like a human, and the degree of freedom is far greater than that of humans. But is this really not a horror movie? At the beginning of the video, Atlas is lying calmly on the ground, seemingly on his back. What follows is jaw-dropping

AI subverts mathematical research! Fields Medal winner and Chinese-American mathematician led 11 top-ranked papers | Liked by Terence Tao AI subverts mathematical research! Fields Medal winner and Chinese-American mathematician led 11 top-ranked papers | Liked by Terence Tao Apr 09, 2024 am 11:52 AM

AI is indeed changing mathematics. Recently, Tao Zhexuan, who has been paying close attention to this issue, forwarded the latest issue of "Bulletin of the American Mathematical Society" (Bulletin of the American Mathematical Society). Focusing on the topic "Will machines change mathematics?", many mathematicians expressed their opinions. The whole process was full of sparks, hardcore and exciting. The author has a strong lineup, including Fields Medal winner Akshay Venkatesh, Chinese mathematician Zheng Lejun, NYU computer scientist Ernest Davis and many other well-known scholars in the industry. The world of AI has changed dramatically. You know, many of these articles were submitted a year ago.

Google is ecstatic: JAX performance surpasses Pytorch and TensorFlow! It may become the fastest choice for GPU inference training Google is ecstatic: JAX performance surpasses Pytorch and TensorFlow! It may become the fastest choice for GPU inference training Apr 01, 2024 pm 07:46 PM

The performance of JAX, promoted by Google, has surpassed that of Pytorch and TensorFlow in recent benchmark tests, ranking first in 7 indicators. And the test was not done on the TPU with the best JAX performance. Although among developers, Pytorch is still more popular than Tensorflow. But in the future, perhaps more large models will be trained and run based on the JAX platform. Models Recently, the Keras team benchmarked three backends (TensorFlow, JAX, PyTorch) with the native PyTorch implementation and Keras2 with TensorFlow. First, they select a set of mainstream

Tesla robots work in factories, Musk: The degree of freedom of hands will reach 22 this year! Tesla robots work in factories, Musk: The degree of freedom of hands will reach 22 this year! May 06, 2024 pm 04:13 PM

The latest video of Tesla's robot Optimus is released, and it can already work in the factory. At normal speed, it sorts batteries (Tesla's 4680 batteries) like this: The official also released what it looks like at 20x speed - on a small "workstation", picking and picking and picking: This time it is released One of the highlights of the video is that Optimus completes this work in the factory, completely autonomously, without human intervention throughout the process. And from the perspective of Optimus, it can also pick up and place the crooked battery, focusing on automatic error correction: Regarding Optimus's hand, NVIDIA scientist Jim Fan gave a high evaluation: Optimus's hand is the world's five-fingered robot. One of the most dexterous. Its hands are not only tactile

FisheyeDetNet: the first target detection algorithm based on fisheye camera FisheyeDetNet: the first target detection algorithm based on fisheye camera Apr 26, 2024 am 11:37 AM

Target detection is a relatively mature problem in autonomous driving systems, among which pedestrian detection is one of the earliest algorithms to be deployed. Very comprehensive research has been carried out in most papers. However, distance perception using fisheye cameras for surround view is relatively less studied. Due to large radial distortion, standard bounding box representation is difficult to implement in fisheye cameras. To alleviate the above description, we explore extended bounding box, ellipse, and general polygon designs into polar/angular representations and define an instance segmentation mIOU metric to analyze these representations. The proposed model fisheyeDetNet with polygonal shape outperforms other models and simultaneously achieves 49.5% mAP on the Valeo fisheye camera dataset for autonomous driving

See all articles