Interpret Toolformer
Large language models (LLM) have shown incredible advantages in solving new tasks with limited text data. However, despite this, they have limitations in other ways, such as:
- lack of access to up-to-date information
- tendency to fantasize about facts
- Difficulties with low-resource languages
- Lack of mathematical skills for precise calculations
- Lack of understanding of time processes
How to use large models to solve more problems? In the article "Interpretation of TaskMatrix.AI", TaskMatrix.AI is a combination of Toolformer and chatGPT, connecting the basic model with millions of APIs to complete tasks. So, what is a Toolformer?
Toolformer is a new open source model of Meta that can solve problems that require the use of APIs, such as calculators, Wikipedia searches, dictionary lookups, etc. Toolformer recognizes that it must use a tool, determines which tool to use, and how to use the tool. The use cases for Toolformers could be endless, from providing instant search results for any question, to contextual information, such as the best restaurants in town.
1. What is Toolformer?
What is Toolformer? In short, Toolformer is a language model that can teach itself how to use tools.
Toolformer is based on a pre-trained GPT-J model with 6.7 billion parameters, trained using self-supervised learning methods. This approach involves sampling and filtering API calls to augment existing text datasets.
Toolformer hopes to complete the task of LLM self-learning how to use tools through the following two requirements:
- The use of tools should be learned through self-supervision without requiring a lot of manual labor Note.
- LM should not lose its generality and should be able to decide for itself when and how to use which tool.
The following image shows Toolformer's predictions (e.g., API calls embedded in the data sample):
2. Toolformer’s architecture and implementation method
One of the core features in ChatGPT is context-based learning (In-Context Learning), which refers to a machine learning method in which the model is presented from a specific context or environment Learn from examples. The goal of contextual learning is to improve the model's ability to understand and generate language that is appropriate for a given context or situation. In natural language processing (NLP) tasks, language models can be trained to generate responses to specific prompts or questions. So, how does Toolformer take advantage of In-Context Learning?
Toolformer is a large language model that enables the use of different tools through API calls. The input and output of each API call needs to be formatted as a sequence of text/conversation to flow naturally within the session.
As you can see from the image above, Toolformer first leverages the model’s contextual learning capabilities to sample a large number of potential API calls.
Execute these API calls and check whether the responses obtained can help predict tokens in the future and be used as filter criteria. After filtering, API calls to different tools are embedded into the raw data samples, resulting in an enhanced dataset on which the model is fine-tuned.
Specifically, the figure above shows a model that accomplishes this task using a question and answer tool:
- The LM dataset contains sample text: Enter the prompt "Pittsburgh" for "Pittsburgh is also known as" is also known as "The Steel City".
- In order to find the correct answer, the model needs to make an API call and make it correctly.
- A few API calls were sampled, specifically "What other name is Pittsburgh known by?" and "Which country is Pittsburgh in?".
- The corresponding answers are "Steel City" and "United States". Because the first answer is better, it is included into a new LM dataset with the API call: "Pittsburgh is also known as [QA("What other name is Pittsburgh known by?") -> Steel City] the Steel City”.
- This contains expected API calls and responses. Repeat this step to generate new LM datasets using various tools (i.e., API calls).
So, LM annotates large amounts of data using API calls embedded in the text, and then uses these API calls to fine-tune LM to make useful API calls. This is how self-supervised training works, and the benefits of this approach include:
- Less need for manual annotation.
- Embedding API calls into text allows LM to use multiple external tools to add more content.
Toolformer then learns to predict which tool will be used for each task.
2.1 Sampling of API calls
The following figure shows that Toolformer uses and to represent the beginning and end of API calls given user input. Writing a prompt for each API encourages the Toolformer to annotate the example with the relevant API call.
Toolformer assigns a probability to each token as a possible continuation of the given sequence. This method samples up to k candidate positions for an API call by calculating the probability assigned by the ToolFormer to initiating the API call at each position in the sequence. Positions with probability greater than a given threshold are kept, and for each position, up to m API calls are obtained by sampling from the Toolformer using a sequence prefixed by the API call and suffixed by the end-of-sequence marker.
2.2 Execution of API calls
The execution of API calls depends entirely on the client that is executing the call. The client can be a different type of application, from another neural network, to a Python script, to a retrieval system that searches in a large corpus. It is important to note that when the client makes a call, the API returns a single text sequence response. This response contains detailed information about the call, including the call's success or failure status, execution time, and more.
Therefore, in order to obtain accurate results, the client should ensure that the correct input parameters are provided. If the input parameters are incorrect, the API may return incorrect results, which may be unacceptable to the user. Additionally, clients should ensure that the connection to the API is stable to avoid connection interruptions or other network issues during calls.
2.3 Filtering API calls
During the filtering process, Toolformer calculates Toolformer's weighted cross-entropy loss through the token after the API call.
Then, compare two different loss calculations:
(i) One is an API call with the result as input to Toolformer
(ii) One is without API call or API call but no result returned.
An API call is considered useful if the input and output provided for the API call make it easier for the Toolformer to predict future tokens. Apply a filtering threshold to retain only API calls where the difference between the two losses is greater than or equal to the threshold.
2.4 Model fine-tuning
Finally, Toolformer merges the remaining API calls with the original input and creates a new API call to augment the dataset. In other words, the augmented dataset contains the same text as the original dataset, with only the API calls inserted.
Then, use the new data set to fine-tune ToolFormer using standard language modeling goals. This ensures that fine-tuning the model on the augmented dataset is exposed to the same content as fine-tuning on the original dataset. By inserting API calls at precise locations and using help models to predict inputs for future tokens, fine-tuning the augmented data enables the language model to understand when and how to use API calls based on its own feedback.
2.5 Inference
During inference, the decoding process is interrupted when the language model produces a "→" token, which indicates the next expected response to the API call. Then, call the appropriate API to get the response and continue decoding after inserting the response and token.
At this point, we need to ensure that the response obtained matches the response expected from the previous token. If it doesn't match, we need to adjust the API call to get the correct response. Before proceeding with decoding, we also need to perform some data processing to prepare for the next step of the inference process. These data processes include analysis of responses, understanding of context, and selection of inference paths. Therefore, during the inference process, not only do you need to call the API to obtain a response, but you also need to perform a series of data processing and analysis to ensure the correctness and consistency of the inference process.
2.6 API Tool
Every API tool that can be used in Toolformer must meet the following two conditions:
- Input/output needs to be represented as a text sequence .
- There are demos available showing how to use these tools.
The initial implementation of Toolformer supports five API tools:
- Q&A: This is another LM that answers simple factual questions.
- Calculator: Currently only supports 4 basic arithmetic operations and rounding to two decimal places.
- Wiki Search: A search engine that returns short text clipped from Wikipedia.
- Machine translation system: an LM that can translate phrases in any language into English.
- Calendar: API call to the calendar that returns the current date without accepting any input.
The following figure shows the input and output examples of all APIs used:
3. Application Example
Toolformer outperforms baseline models and GPT-3 in tasks such as LAMA, math datasets, question answering, and temporal datasets, but performs worse than other models in multilingual question answering. Toolformer uses API calls to complete tasks, such as LAMA API, Calculator API, and Wikipedia Search Tool API.
3.1 LAMA
The task is to complete a statement that lacks facts. Toolformer outperforms baseline models and even larger models such as GPT-3. The following table shows the results obtained through LAMA API calls:
3.2 Mathematical Dataset
The task is to evaluate the mathematical reasoning of Toolformer Ability to compare various baseline models. Toolformer performs better than the other models, probably because of its fine-tuning of the API call examples. Allowing models to make API calls significantly improves performance on all tasks and outperforms larger models such as OPT and GPT-3. In almost all cases, the model decided to ask the calculator tool for help.
The following table shows the results obtained through the Calculator API call:
3.3 Question Answer
The task is To answer the question, Toolformer outperforms baseline models of the same size, but outperforms GPT-3(175B). Toolformer utilizes Wikipedia's search tools for most of the examples in this task. The following table shows the results obtained through the Wikipedia search tool API call:
3.4 Multilingual Question and Answer
Question and Answer dataset is used On the multilingual question answering benchmark MLQA, which contains context passages in English and questions in Arabic, German, Spanish, Hindi, Vietnamese or Simplified Chinese. Toolformer isn't the strongest performer here, probably due to CCNet's lack of tuning across all languages.
The following table shows the results obtained through the Wikipedia search tool API call:
3.5 Time Dataset
The task is to know where the current date is crucial to answer the question. Toolformer was able to outperform the baseline, however, it clearly did not utilize the calendar tool 100% of the time. Instead, it uses a Wikipedia search. The following table shows the results obtained through the Wikipedia search tool API call:
4. Limitations of ToolFormer
Toolformer still has some limitations Limitations such as the inability to use multiple tools at the same time, the inability to handle tools that return too many results, sensitivity to input wording leading to inefficiency, failure to consider usage costs that may lead to high computational costs, etc. details as follows:
- Because each tool's API call is generated independently, Toolformer cannot use multiple tools in one process.
- Especially for tools that may return hundreds of different results (such as search engines), Toolformer cannot be used interactively.
- Models trained using Toolformer are very sensitive to the exact wording of the input, this approach is inefficient for some tools and requires extensive documentation to generate a small number of useful API calls.
- When deciding to use each tool, the cost of using it is not considered, which may result in higher computational costs.
5. Summary
Toolformer is a large-scale language model that uses In-Context Learning to improve the model's ability to understand and generate language suitable for a given context or situation. It uses API calls to annotate large amounts of data and then uses these API calls to fine-tune the model to make useful API calls. Toolformer learns to predict which tool will be used for each task. However, Toolformer still has some limitations, such as the inability to use multiple tools in a process and the inability to use interactively tools that may return hundreds of different results.
[Reference materials and related reading]
- Toolformer: Language Models Can Teach Themselves to Use Tools, https://arxiv.org/pdf/2302.04761.pdf
- Meta's Toolformer Uses APIs to Outperform GPT-3 on Zero-Shot NLP Tasks, https://www.infoq.com/news/2023/04/meta-toolformer/
- Toolformer: Language Models Can Teach Themselves to Use Tools (2023), https://kikaben.com/toolformer-2023/
- Breaking Down Toolformer, https://www.shaped.ai/blog/breaking-down-toolformer
- Toolformer: Meta Re-enters the ChatGPT Race With a New Model Using Wikipedia, https://thechainsaw.com/business/meta-toolformer-ai/
- Toolformer language model uses external tools on its own , https://the-decoder.com/toolformer-language-model-uses-external-tools-on-its-own/
The above is the detailed content of Interpret Toolformer. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics



Imagine an artificial intelligence model that not only has the ability to surpass traditional computing, but also achieves more efficient performance at a lower cost. This is not science fiction, DeepSeek-V2[1], the world’s most powerful open source MoE model is here. DeepSeek-V2 is a powerful mixture of experts (MoE) language model with the characteristics of economical training and efficient inference. It consists of 236B parameters, 21B of which are used to activate each marker. Compared with DeepSeek67B, DeepSeek-V2 has stronger performance, while saving 42.5% of training costs, reducing KV cache by 93.3%, and increasing the maximum generation throughput to 5.76 times. DeepSeek is a company exploring general artificial intelligence

AI is indeed changing mathematics. Recently, Tao Zhexuan, who has been paying close attention to this issue, forwarded the latest issue of "Bulletin of the American Mathematical Society" (Bulletin of the American Mathematical Society). Focusing on the topic "Will machines change mathematics?", many mathematicians expressed their opinions. The whole process was full of sparks, hardcore and exciting. The author has a strong lineup, including Fields Medal winner Akshay Venkatesh, Chinese mathematician Zheng Lejun, NYU computer scientist Ernest Davis and many other well-known scholars in the industry. The world of AI has changed dramatically. You know, many of these articles were submitted a year ago.

The performance of JAX, promoted by Google, has surpassed that of Pytorch and TensorFlow in recent benchmark tests, ranking first in 7 indicators. And the test was not done on the TPU with the best JAX performance. Although among developers, Pytorch is still more popular than Tensorflow. But in the future, perhaps more large models will be trained and run based on the JAX platform. Models Recently, the Keras team benchmarked three backends (TensorFlow, JAX, PyTorch) with the native PyTorch implementation and Keras2 with TensorFlow. First, they select a set of mainstream

Boston Dynamics Atlas officially enters the era of electric robots! Yesterday, the hydraulic Atlas just "tearfully" withdrew from the stage of history. Today, Boston Dynamics announced that the electric Atlas is on the job. It seems that in the field of commercial humanoid robots, Boston Dynamics is determined to compete with Tesla. After the new video was released, it had already been viewed by more than one million people in just ten hours. The old people leave and new roles appear. This is a historical necessity. There is no doubt that this year is the explosive year of humanoid robots. Netizens commented: The advancement of robots has made this year's opening ceremony look like a human, and the degree of freedom is far greater than that of humans. But is this really not a horror movie? At the beginning of the video, Atlas is lying calmly on the ground, seemingly on his back. What follows is jaw-dropping

Earlier this month, researchers from MIT and other institutions proposed a very promising alternative to MLP - KAN. KAN outperforms MLP in terms of accuracy and interpretability. And it can outperform MLP running with a larger number of parameters with a very small number of parameters. For example, the authors stated that they used KAN to reproduce DeepMind's results with a smaller network and a higher degree of automation. Specifically, DeepMind's MLP has about 300,000 parameters, while KAN only has about 200 parameters. KAN has a strong mathematical foundation like MLP. MLP is based on the universal approximation theorem, while KAN is based on the Kolmogorov-Arnold representation theorem. As shown in the figure below, KAN has

The latest video of Tesla's robot Optimus is released, and it can already work in the factory. At normal speed, it sorts batteries (Tesla's 4680 batteries) like this: The official also released what it looks like at 20x speed - on a small "workstation", picking and picking and picking: This time it is released One of the highlights of the video is that Optimus completes this work in the factory, completely autonomously, without human intervention throughout the process. And from the perspective of Optimus, it can also pick up and place the crooked battery, focusing on automatic error correction: Regarding Optimus's hand, NVIDIA scientist Jim Fan gave a high evaluation: Optimus's hand is the world's five-fingered robot. One of the most dexterous. Its hands are not only tactile

Target detection is a relatively mature problem in autonomous driving systems, among which pedestrian detection is one of the earliest algorithms to be deployed. Very comprehensive research has been carried out in most papers. However, distance perception using fisheye cameras for surround view is relatively less studied. Due to large radial distortion, standard bounding box representation is difficult to implement in fisheye cameras. To alleviate the above description, we explore extended bounding box, ellipse, and general polygon designs into polar/angular representations and define an instance segmentation mIOU metric to analyze these representations. The proposed model fisheyeDetNet with polygonal shape outperforms other models and simultaneously achieves 49.5% mAP on the Valeo fisheye camera dataset for autonomous driving

Project link written in front: https://nianticlabs.github.io/mickey/ Given two pictures, the camera pose between them can be estimated by establishing the correspondence between the pictures. Typically, these correspondences are 2D to 2D, and our estimated poses are scale-indeterminate. Some applications, such as instant augmented reality anytime, anywhere, require pose estimation of scale metrics, so they rely on external depth estimators to recover scale. This paper proposes MicKey, a keypoint matching process capable of predicting metric correspondences in 3D camera space. By learning 3D coordinate matching across images, we are able to infer metric relative
