Large language models (LLM) have shown incredible advantages in solving new tasks with limited text data. However, despite this, they have limitations in other ways, such as:
How to use large models to solve more problems? In the article "Interpretation of TaskMatrix.AI", TaskMatrix.AI is a combination of Toolformer and chatGPT, connecting the basic model with millions of APIs to complete tasks. So, what is a Toolformer?
Toolformer is a new open source model of Meta that can solve problems that require the use of APIs, such as calculators, Wikipedia searches, dictionary lookups, etc. Toolformer recognizes that it must use a tool, determines which tool to use, and how to use the tool. The use cases for Toolformers could be endless, from providing instant search results for any question, to contextual information, such as the best restaurants in town.
What is Toolformer? In short, Toolformer is a language model that can teach itself how to use tools.
Toolformer is based on a pre-trained GPT-J model with 6.7 billion parameters, trained using self-supervised learning methods. This approach involves sampling and filtering API calls to augment existing text datasets.
Toolformer hopes to complete the task of LLM self-learning how to use tools through the following two requirements:
The following image shows Toolformer's predictions (e.g., API calls embedded in the data sample):
One of the core features in ChatGPT is context-based learning (In-Context Learning), which refers to a machine learning method in which the model is presented from a specific context or environment Learn from examples. The goal of contextual learning is to improve the model's ability to understand and generate language that is appropriate for a given context or situation. In natural language processing (NLP) tasks, language models can be trained to generate responses to specific prompts or questions. So, how does Toolformer take advantage of In-Context Learning?
Toolformer is a large language model that enables the use of different tools through API calls. The input and output of each API call needs to be formatted as a sequence of text/conversation to flow naturally within the session.
As you can see from the image above, Toolformer first leverages the model’s contextual learning capabilities to sample a large number of potential API calls.
Execute these API calls and check whether the responses obtained can help predict tokens in the future and be used as filter criteria. After filtering, API calls to different tools are embedded into the raw data samples, resulting in an enhanced dataset on which the model is fine-tuned.
Specifically, the figure above shows a model that accomplishes this task using a question and answer tool:
So, LM annotates large amounts of data using API calls embedded in the text, and then uses these API calls to fine-tune LM to make useful API calls. This is how self-supervised training works, and the benefits of this approach include:
Toolformer then learns to predict which tool will be used for each task.
The following figure shows that Toolformer uses and to represent the beginning and end of API calls given user input. Writing a prompt for each API encourages the Toolformer to annotate the example with the relevant API call.
Toolformer assigns a probability to each token as a possible continuation of the given sequence. This method samples up to k candidate positions for an API call by calculating the probability assigned by the ToolFormer to initiating the API call at each position in the sequence. Positions with probability greater than a given threshold are kept, and for each position, up to m API calls are obtained by sampling from the Toolformer using a sequence prefixed by the API call and suffixed by the end-of-sequence marker.
The execution of API calls depends entirely on the client that is executing the call. The client can be a different type of application, from another neural network, to a Python script, to a retrieval system that searches in a large corpus. It is important to note that when the client makes a call, the API returns a single text sequence response. This response contains detailed information about the call, including the call's success or failure status, execution time, and more.
Therefore, in order to obtain accurate results, the client should ensure that the correct input parameters are provided. If the input parameters are incorrect, the API may return incorrect results, which may be unacceptable to the user. Additionally, clients should ensure that the connection to the API is stable to avoid connection interruptions or other network issues during calls.
During the filtering process, Toolformer calculates Toolformer's weighted cross-entropy loss through the token after the API call.
Then, compare two different loss calculations:
(i) One is an API call with the result as input to Toolformer
(ii) One is without API call or API call but no result returned.
An API call is considered useful if the input and output provided for the API call make it easier for the Toolformer to predict future tokens. Apply a filtering threshold to retain only API calls where the difference between the two losses is greater than or equal to the threshold.
Finally, Toolformer merges the remaining API calls with the original input and creates a new API call to augment the dataset. In other words, the augmented dataset contains the same text as the original dataset, with only the API calls inserted.
Then, use the new data set to fine-tune ToolFormer using standard language modeling goals. This ensures that fine-tuning the model on the augmented dataset is exposed to the same content as fine-tuning on the original dataset. By inserting API calls at precise locations and using help models to predict inputs for future tokens, fine-tuning the augmented data enables the language model to understand when and how to use API calls based on its own feedback.
During inference, the decoding process is interrupted when the language model produces a "→" token, which indicates the next expected response to the API call. Then, call the appropriate API to get the response and continue decoding after inserting the response and token.
At this point, we need to ensure that the response obtained matches the response expected from the previous token. If it doesn't match, we need to adjust the API call to get the correct response. Before proceeding with decoding, we also need to perform some data processing to prepare for the next step of the inference process. These data processes include analysis of responses, understanding of context, and selection of inference paths. Therefore, during the inference process, not only do you need to call the API to obtain a response, but you also need to perform a series of data processing and analysis to ensure the correctness and consistency of the inference process.
Every API tool that can be used in Toolformer must meet the following two conditions:
The initial implementation of Toolformer supports five API tools:
The following figure shows the input and output examples of all APIs used:
Toolformer outperforms baseline models and GPT-3 in tasks such as LAMA, math datasets, question answering, and temporal datasets, but performs worse than other models in multilingual question answering. Toolformer uses API calls to complete tasks, such as LAMA API, Calculator API, and Wikipedia Search Tool API.
The task is to complete a statement that lacks facts. Toolformer outperforms baseline models and even larger models such as GPT-3. The following table shows the results obtained through LAMA API calls:
The task is to evaluate the mathematical reasoning of Toolformer Ability to compare various baseline models. Toolformer performs better than the other models, probably because of its fine-tuning of the API call examples. Allowing models to make API calls significantly improves performance on all tasks and outperforms larger models such as OPT and GPT-3. In almost all cases, the model decided to ask the calculator tool for help.
The following table shows the results obtained through the Calculator API call:
The task is To answer the question, Toolformer outperforms baseline models of the same size, but outperforms GPT-3(175B). Toolformer utilizes Wikipedia's search tools for most of the examples in this task. The following table shows the results obtained through the Wikipedia search tool API call:
Question and Answer dataset is used On the multilingual question answering benchmark MLQA, which contains context passages in English and questions in Arabic, German, Spanish, Hindi, Vietnamese or Simplified Chinese. Toolformer isn't the strongest performer here, probably due to CCNet's lack of tuning across all languages.
The following table shows the results obtained through the Wikipedia search tool API call:
The task is to know where the current date is crucial to answer the question. Toolformer was able to outperform the baseline, however, it clearly did not utilize the calendar tool 100% of the time. Instead, it uses a Wikipedia search. The following table shows the results obtained through the Wikipedia search tool API call:
Toolformer still has some limitations Limitations such as the inability to use multiple tools at the same time, the inability to handle tools that return too many results, sensitivity to input wording leading to inefficiency, failure to consider usage costs that may lead to high computational costs, etc. details as follows:
Toolformer is a large-scale language model that uses In-Context Learning to improve the model's ability to understand and generate language suitable for a given context or situation. It uses API calls to annotate large amounts of data and then uses these API calls to fine-tune the model to make useful API calls. Toolformer learns to predict which tool will be used for each task. However, Toolformer still has some limitations, such as the inability to use multiple tools in a process and the inability to use interactively tools that may return hundreds of different results.
[Reference materials and related reading]
The above is the detailed content of Interpret Toolformer. For more information, please follow other related articles on the PHP Chinese website!