


ChatGPT can choose models by itself! Microsoft Asia Research Institute + Zhejiang University's hot new paper, the HuggingGPT project has been open source
The AI craze triggered by ChatGPT has also "burned" the financial circle.
Recently, researchers at Bloomberg have also developed a GPT in the financial field—Bloomberg GPT, with 50 billion parameters.
The emergence of GPT-4 has given many people a taste of the powerful capabilities of large language models.
#However, OpenAI is not open. Many people in the industry have begun to clone GPT, and many ChatGPT replacement models are built on open source models, especially the Meta open source LLMa model.
#For example, Stanford's Alpaca, UC Berkeley teamed up with CMU, Stanford and other Vicuna, Dolly of the startup Databricks, etc.
Various ChatGPT-like large-scale language models built for different tasks and applications present a hundred schools of thought in the entire field. potential.
So the question is, how do researchers choose an appropriate model, or even multiple models, to complete a complex task?
Recently, the research team from Microsoft Research Asia and Zhejiang University released HuggingGPT, a large model collaboration system.
##Paper address: https://arxiv.org/pdf/2303.17580.pdf
HuggingGPT uses ChatGPT as a controller to connect various AI models in the HuggingFace community to complete multi-modal complex tasks.
This means that you will have a kind of super magic. Through HuggingGPT, you can have multi-modal capabilities, including pictures, videos, and voices. .
HuggingGPT BridgeResearchers pointed out that solving the current problems of large language models (LLMs) may be the first step towards AGI. It is also a critical step.
Because the current technology of large language models still has some shortcomings, there are some pressing challenges on the road to building AGI systems.- Limited by the input and output forms of text generation, current LLMs lack the ability to process complex information (such as vision and speech);
- In actual application scenarios, some complex tasks usually consist of multiple subtasks, so the scheduling and collaboration of multiple models are required, which is also beyond the capabilities of the language model;
- For some challenging tasks, LLMs show excellent results in zero-sample or few-sample settings, but they are still weaker than some experts (such as fine-tuned models).
To handle complex AI tasks, LLMs should be able to coordinate with external models to leverage their capabilities. Therefore, the key point is how to choose the appropriate middleware to bridge LLMs and AI models.
Researchers found that each AI model can be expressed in a language form by summarizing its model functions.
Thus, a concept is introduced, "Language is LLMs, namely ChatGPT, a universal interface to connect artificial intelligence models."
By incorporating the AI model description into the prompts, ChatGPT can be considered the brain that manages the AI model. Therefore, this method allows ChatGPT to call external models to solve practical tasks.
To put it simply, HuggingGPT is a collaboration system, not a large model.
Its function is to connect ChatGPT and HuggingFace to process input in different modalities and solve many complex artificial intelligence tasks.
So, every AI model in the HuggingFace community has a corresponding model description in the HuggingGPT library and is integrated into the prompt to build a ChatGPT connection.
HuggingGPT then uses ChatGPT as the brain to determine the answer to the question.
So far, HuggingGPT has integrated hundreds of models on HuggingFace around ChatGPT, covering text classification, target detection, semantic segmentation, image generation, 24 tasks including Q&A, text-to-speech, and text-to-video.
Experimental results prove that HuggingGPT has the ability to handle multi-modal information and complex artificial intelligence tasks.
Four-step workflow
HuggingGPT entire workflow It can be divided into the following four stages:
-Task planning: ChatGPT parses user requests, breaks them into multiple tasks, and plans the task sequence based on its knowledge and dependencies
- Model selection: LLM assigns the parsed tasks to expert models based on the model description in HuggingFace
-Task execution: The expert model executes the assigned task on the inference endpoint and records the execution information and inference results into LLM
- Response generation: LLM summarizes the execution process log and inference results, and returns the summary to the user
Multi-modal capabilities, with
Experimental settings
In the experiment, the researcher used gpt-3.5-turbo and text-davinci-003 Variants of GPT models serve as Large Language Models (LLMs), which are publicly accessible through the OpenAI API.
#In order to make the output of LLM more stable, we set the decoding temperature to 0.
#At the same time, in order to adjust the output of LLM to conform to the expected format, we set logit_bias to 0.1 on the format constraint.
The researchers provide detailed tips designed for the mission planning, model selection, and reaction generation phases in the following table, where {{variable}} represents Before the prompt is entered into the LLM, the field values need to be filled in with the corresponding text.
Researchers tested HuggingGPT on a wide range of multi-modal tasks.
With the cooperation of ChatGP and expert models, HuggingGPT can solve tasks in multiple modes such as language, image, audio and video, including detection, generation, classification and question answering. Task.
#Although these tasks may seem simple, mastering the basic capabilities of HuggingGPT is a prerequisite for solving complex tasks.
For example, visual question and answer task:
# #Text generation:
文生图:
HuggingGPT can integrate multiple input contents to perform simple reasoning. It can be found that even if there are multiple task resources, HuggingGPT can decompose the main task into multiple basic tasks, and finally integrate the inference results of multiple models to obtain the correct answer.
In addition, the researchers evaluated the effectiveness of HuggingGPT in complex task situations through tests.
# demonstrated HuggingGPT’s ability to handle multiple complex tasks.
When processing multiple requests, they may contain multiple implicit tasks or require multiple aspects of information. In this case, relying on an expert model to solve the problem is not enough.
#HuggingGPT can organize the collaboration of multiple models through task planning.
A user request may explicitly contain multiple tasks:
The figure below shows HuggingGPT’s ability to handle complex tasks in multi-turn dialogue scenarios.
Users divide a complex request into several steps and reach the final goal through multiple rounds of requests. It was found that HuggingGPT can track the situation status of user requests through dialogue situation management in the task planning stage, and can well solve the requested resources and task planning mentioned by users.
Currently, this project has been open sourced on GitHub. But the code has not been fully released.
Interestingly, the researchers named this project Jarvis in "Iron Man", the invincible AI Here it comes.
JARVIS: A system connecting LLMs and the ML communityBy the way, HuggingGPT requires the OpenAI API to be used.
Netizen: The future of research
JARVIS / HuggingGPT is just like the Toolformer proposed by Meta before. They are all acting as connectors.
#Even, including ChatGPT plugins.
Netizens said, "I strongly suspect that the first artificial general intelligence (AGI) will appear earlier than expected. It will rely on "glue" artificial intelligence , able to intelligently glue together a series of narrow artificial intelligence and practical tools.
#I was given access to the plug-in, which transformed it from a math noob to a math genius overnight. Of course, this is only a small step, but it is a sign of future development trends.
I predict that in the next year or so we will see an AI assistant that is Dozens of large language models (LLMs) and similar tools are connected, and end users simply give instructions to their assistants to complete tasks for them. This sci-fi moment is coming.
Some netizens said that this is the future research method.
GPT In front of a lot of tools, you know how to use them.
The above is the detailed content of ChatGPT can choose models by itself! Microsoft Asia Research Institute + Zhejiang University's hot new paper, the HuggingGPT project has been open source. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics



On July 29, at the roll-off ceremony of AITO Wenjie's 400,000th new car, Yu Chengdong, Huawei's Managing Director, Chairman of Terminal BG, and Chairman of Smart Car Solutions BU, attended and delivered a speech and announced that Wenjie series models will be launched this year In August, Huawei Qiankun ADS 3.0 version was launched, and it is planned to successively push upgrades from August to September. The Xiangjie S9, which will be released on August 6, will debut Huawei’s ADS3.0 intelligent driving system. With the assistance of lidar, Huawei Qiankun ADS3.0 version will greatly improve its intelligent driving capabilities, have end-to-end integrated capabilities, and adopt a new end-to-end architecture of GOD (general obstacle identification)/PDP (predictive decision-making and control) , providing the NCA function of smart driving from parking space to parking space, and upgrading CAS3.0

Imagine an artificial intelligence model that not only has the ability to surpass traditional computing, but also achieves more efficient performance at a lower cost. This is not science fiction, DeepSeek-V2[1], the world’s most powerful open source MoE model is here. DeepSeek-V2 is a powerful mixture of experts (MoE) language model with the characteristics of economical training and efficient inference. It consists of 236B parameters, 21B of which are used to activate each marker. Compared with DeepSeek67B, DeepSeek-V2 has stronger performance, while saving 42.5% of training costs, reducing KV cache by 93.3%, and increasing the maximum generation throughput to 5.76 times. DeepSeek is a company exploring general artificial intelligence

Earlier this month, researchers from MIT and other institutions proposed a very promising alternative to MLP - KAN. KAN outperforms MLP in terms of accuracy and interpretability. And it can outperform MLP running with a larger number of parameters with a very small number of parameters. For example, the authors stated that they used KAN to reproduce DeepMind's results with a smaller network and a higher degree of automation. Specifically, DeepMind's MLP has about 300,000 parameters, while KAN only has about 200 parameters. KAN has a strong mathematical foundation like MLP. MLP is based on the universal approximation theorem, while KAN is based on the Kolmogorov-Arnold representation theorem. As shown in the figure below, KAN has

Boston Dynamics Atlas officially enters the era of electric robots! Yesterday, the hydraulic Atlas just "tearfully" withdrew from the stage of history. Today, Boston Dynamics announced that the electric Atlas is on the job. It seems that in the field of commercial humanoid robots, Boston Dynamics is determined to compete with Tesla. After the new video was released, it had already been viewed by more than one million people in just ten hours. The old people leave and new roles appear. This is a historical necessity. There is no doubt that this year is the explosive year of humanoid robots. Netizens commented: The advancement of robots has made this year's opening ceremony look like a human, and the degree of freedom is far greater than that of humans. But is this really not a horror movie? At the beginning of the video, Atlas is lying calmly on the ground, seemingly on his back. What follows is jaw-dropping

AI is indeed changing mathematics. Recently, Tao Zhexuan, who has been paying close attention to this issue, forwarded the latest issue of "Bulletin of the American Mathematical Society" (Bulletin of the American Mathematical Society). Focusing on the topic "Will machines change mathematics?", many mathematicians expressed their opinions. The whole process was full of sparks, hardcore and exciting. The author has a strong lineup, including Fields Medal winner Akshay Venkatesh, Chinese mathematician Zheng Lejun, NYU computer scientist Ernest Davis and many other well-known scholars in the industry. The world of AI has changed dramatically. You know, many of these articles were submitted a year ago.

The performance of JAX, promoted by Google, has surpassed that of Pytorch and TensorFlow in recent benchmark tests, ranking first in 7 indicators. And the test was not done on the TPU with the best JAX performance. Although among developers, Pytorch is still more popular than Tensorflow. But in the future, perhaps more large models will be trained and run based on the JAX platform. Models Recently, the Keras team benchmarked three backends (TensorFlow, JAX, PyTorch) with the native PyTorch implementation and Keras2 with TensorFlow. First, they select a set of mainstream

The latest video of Tesla's robot Optimus is released, and it can already work in the factory. At normal speed, it sorts batteries (Tesla's 4680 batteries) like this: The official also released what it looks like at 20x speed - on a small "workstation", picking and picking and picking: This time it is released One of the highlights of the video is that Optimus completes this work in the factory, completely autonomously, without human intervention throughout the process. And from the perspective of Optimus, it can also pick up and place the crooked battery, focusing on automatic error correction: Regarding Optimus's hand, NVIDIA scientist Jim Fan gave a high evaluation: Optimus's hand is the world's five-fingered robot. One of the most dexterous. Its hands are not only tactile

Target detection is a relatively mature problem in autonomous driving systems, among which pedestrian detection is one of the earliest algorithms to be deployed. Very comprehensive research has been carried out in most papers. However, distance perception using fisheye cameras for surround view is relatively less studied. Due to large radial distortion, standard bounding box representation is difficult to implement in fisheye cameras. To alleviate the above description, we explore extended bounding box, ellipse, and general polygon designs into polar/angular representations and define an instance segmentation mIOU metric to analyze these representations. The proposed model fisheyeDetNet with polygonal shape outperforms other models and simultaneously achieves 49.5% mAP on the Valeo fisheye camera dataset for autonomous driving
