Overnight, the world’s most powerful open source large model Falcon 180B detonated the entire network!
180 billion parameters, Falcon completed training on 3.5 trillion tokens and directly topped the Hugging Face rankings.
In the benchmark test, Falcon 180B defeated Llama 2 in various tasks such as reasoning, coding, proficiency and knowledge testing.
Even, Falcon 180B can be as good as Google PaLM 2, and its performance is close to GPT-4.
However, NVIDIA senior scientist Jim Fan questioned this.
-In the training data of Falcon-180B, the code only Accounting for 5%.
And code is by far the most useful data for improving reasoning capabilities, mastering tool usage, and enhancing AI agents. In fact, GPT-3.5 is fine-tuned based on Codex.
- No encoded baseline data.
Without coding ability, you cannot claim to be "better than GPT-3.5" or "close to GPT-4". It should be an integral part of the pre-training recipe, not a tweak afterward.
#- For language models with parameters larger than 30B, it is time to adopt a hybrid expert system (MoE). So far we have only seen OSS MoE LLM
Let’s take a look, what is Falcon 180B?
Previously, Falcon has launched three model sizes, They are 1.3B, 7.5B, and 40B respectively.
According to the official introduction, Falcon 180B is an upgraded version of 40B. It was launched by TII, the world's leading technology research center in Abu Dhabi, and is available for free commercial use.
This time, the researchers made technical innovations in the base model, such as using Multi-Query Attention to improve the scalability of the model. .
For the training process, Falcon 180B is based on Amazon SageMaker, the Amazon cloud machine learning platform, and has completed the training of 3.5 trillion tokens on up to 4096 GPUs. training.
Total GPU calculation time, approximately 7,000,000.
The parameter size of Falcon 180B is 2.5 times that of Llama 2 (70B), and the amount of calculation required for training is 4 times that of Llama 2.
Among the specific training data, Falcon 180B is mainly the RefinedWe data set (accounting for about 85%).
Additionally, it was trained on a curated mix of conversations, technical papers, and a small portion of code.
This pre-training data set is large enough that even 3.5 trillion tokens only occupy less than one epoch.
Officially claims that Falcon 180B is currently the "best" open source large model. The specific performance is as follows:
On the MMLU benchmark, Falcon 180B outperforms Llama 2 70B and GPT-3.5.
On par with Google's PaLM 2-Large on HellaSwag, LAMBADA, WebQuestions, Winogrande, PIQA, ARC, BoolQ, CB, COPA, RTE, WiC, WSC and ReCoRD .
In addition, it is currently the open large model with the highest score (68.74 points) on the Hugging Face open source large model list, surpassing LlaMA 2 (67.35).
At the same time, the researchers also released the chat dialogue model Falcon-180B-Chat. The model is fine-tuned on conversation and instruction datasets covering Open-Platypus, UltraChat and Airoboros.
Now, everyone can have a demo experience.
Address: https://huggingface.co/tiiuae/falcon-180B-chat
The basic model does not have a prompt format because it is not a large conversational model, nor is it trained through instructions, so it does not respond in a conversational manner.
Pre-trained models are a great platform for fine-tuning, but perhaps you shouldn’t use them directly. Its dialogue model has a simple dialogue mode.
System: Add an optional system prompt hereUser: This is the user inputFalcon: This is what the model generatesUser: This might be a second turn inputFalcon: and so on
Starting from Transformers 4.33, Falcon 180B can be used and downloaded in the Hugging Face ecosystem.
Make sure you are logged in to your Hugging Face account and have the latest version of transformers installed:
pip install --upgrade transformershuggingface-cli login
bfloat16
Here's how to use the base model in bfloat16. The Falcon 180B is a large model, so please be aware of its hardware requirements.
In this regard, the hardware requirements are as follows:
It can be seen that if you want to fully fine-tune Falcon 180B, you need at least 8X8X A100 80G, If it is just for inference, you will also need an 8XA100 80G GPU.
from transformers import AutoTokenizer, AutoModelForCausalLMimport transformersimport torchmodel_id = "tiiuae/falcon-180B"tokenizer = AutoTokenizer.from_pretrained(model_id)model = AutoModelForCausalLM.from_pretrained(model_id,torch_dtype=torch.bfloat16,device_map="auto",)prompt = "My name is Pedro, I live in"inputs = tokenizer(prompt, return_tensors="pt").to("cuda")output = model.generate(input_ids=inputs["input_ids"],attention_mask=inputs["attention_mask"],do_sample=True,temperature=0.6,top_p=0.9,max_new_tokens=50,)output = output[0].to("cpu")print(tokenizer.decode(output)
may produce the following output:
My name is Pedro, I live in Portugal and I am 25 years old. I am a graphic designer, but I am also passionate about photography and video.I love to travel and I am always looking for new adventures. I love to meet new people and explore new places.
Use 8 bits and 4 bitsandbytes
# Additionally, the 8-bit and 4-bit quantized versions of Falcon 180B are virtually indistinguishable from bfloat16 in terms of evaluation!
This is good news for inference, as users can confidently use the quantized version to reduce hardware requirements.
Note that inference is much faster in the 8-bit version than in the 4-bit version. To use quantization, you need to install the "bitsandbytes" library and enable the corresponding flag when loading the model:
model = AutoModelForCausalLM.from_pretrained(model_id,torch_dtype=torch.bfloat16,**load_in_8bit=True,**device_map="auto",)
Dialog Model
As mentioned above, the version of the model fine-tuned for tracking conversations uses a very straightforward training template. We have to follow the same pattern to run chat-style reasoning.
For reference, you can take a look at the [format_prompt] function in the chat demo:
def format_prompt(message, history, system_prompt):prompt = ""if system_prompt:prompt += f"System: {system_prompt}\n"for user_prompt, bot_response in history:prompt += f"User: {user_prompt}\n"prompt += f"Falcon: {bot_response}\n"prompt += f"User: {message}\nFalcon:"return prompt
As you can see from the above, the user's interaction and model The responses are preceded by User: and Falcon: delimiters. We connect them together to form a prompt that contains the entire conversation history. This way, a system prompt can be provided to adjust the build style.
Many netizens have heated discussions about the true strength of Falcon 180B.
Absolutely unbelievable. It beats GPT-3.5 and is on par with Google's PaLM-2 Large. This is a game changer!
A startup CEO said that I tested the Falcon-180B conversation robot and it was no better than the Llama2-70B chat system. The HF OpenLLM rankings also show mixed results. This is surprising considering its larger size and larger training set.
Give a chestnut:
Give some entries, let Falcon-180B and Llama2-70B Answer them separately and see what the effect is?
Falcon-180B mistakenly counts a saddle as an animal. Llama2-70B answered concisely and gave the correct answer.
The above is the detailed content of 180 billion parameters, the world's top open source large model Falcon is officially announced! Crush LLaMA 2, performance is close to GPT-4. For more information, please follow other related articles on the PHP Chinese website!