Home > Technology peripherals > AI > body text

180 billion parameters, the world's top open source large model Falcon is officially announced! Crush LLaMA 2, performance is close to GPT-4

PHPz
Release: 2023-09-13 16:13:01
forward
828 people have browsed it

Overnight, the world’s most powerful open source large model Falcon 180B detonated the entire network!

180 billion parameters, Falcon completed training on 3.5 trillion tokens and directly topped the Hugging Face rankings.

In the benchmark test, Falcon 180B defeated Llama 2 in various tasks such as reasoning, coding, proficiency and knowledge testing.

1800亿参数,世界顶级开源大模型Falcon官宣!碾压LLaMA 2,性能直逼GPT-4

Even, Falcon 180B can be as good as Google PaLM 2, and its performance is close to GPT-4.

However, NVIDIA senior scientist Jim Fan questioned this.

-In the training data of Falcon-180B, the code only Accounting for 5%.

And code is by far the most useful data for improving reasoning capabilities, mastering tool usage, and enhancing AI agents. In fact, GPT-3.5 is fine-tuned based on Codex.

- No encoded baseline data.

Without coding ability, you cannot claim to be "better than GPT-3.5" or "close to GPT-4". It should be an integral part of the pre-training recipe, not a tweak afterward.

#- For language models with parameters larger than 30B, it is time to adopt a hybrid expert system (MoE). So far we have only seen OSS MoE LLM

1800亿参数,世界顶级开源大模型Falcon官宣!碾压LLaMA 2,性能直逼GPT-4

Let’s take a look, what is Falcon 180B?

1800亿参数,世界顶级开源大模型Falcon官宣!碾压LLaMA 2,性能直逼GPT-4

The world’s most powerful open source large model

Previously, Falcon has launched three model sizes, They are 1.3B, 7.5B, and 40B respectively.

According to the official introduction, Falcon 180B is an upgraded version of 40B. It was launched by TII, the world's leading technology research center in Abu Dhabi, and is available for free commercial use.

1800亿参数,世界顶级开源大模型Falcon官宣!碾压LLaMA 2,性能直逼GPT-4

This time, the researchers made technical innovations in the base model, such as using Multi-Query Attention to improve the scalability of the model. .

1800亿参数,世界顶级开源大模型Falcon官宣!碾压LLaMA 2,性能直逼GPT-4

For the training process, Falcon 180B is based on Amazon SageMaker, the Amazon cloud machine learning platform, and has completed the training of 3.5 trillion tokens on up to 4096 GPUs. training.

Total GPU calculation time, approximately 7,000,000.

The parameter size of Falcon 180B is 2.5 times that of Llama 2 (70B), and the amount of calculation required for training is 4 times that of Llama 2.

Among the specific training data, Falcon 180B is mainly the RefinedWe data set (accounting for about 85%).

Additionally, it was trained on a curated mix of conversations, technical papers, and a small portion of code.

This pre-training data set is large enough that even 3.5 trillion tokens only occupy less than one epoch.

1800亿参数,世界顶级开源大模型Falcon官宣!碾压LLaMA 2,性能直逼GPT-4

Officially claims that Falcon 180B is currently the "best" open source large model. The specific performance is as follows:

On the MMLU benchmark, Falcon 180B outperforms Llama 2 70B and GPT-3.5.

On par with Google's PaLM 2-Large on HellaSwag, LAMBADA, WebQuestions, Winogrande, PIQA, ARC, BoolQ, CB, COPA, RTE, WiC, WSC and ReCoRD .

1800亿参数,世界顶级开源大模型Falcon官宣!碾压LLaMA 2,性能直逼GPT-4

In addition, it is currently the open large model with the highest score (68.74 points) on the Hugging Face open source large model list, surpassing LlaMA 2 (67.35).

1800亿参数,世界顶级开源大模型Falcon官宣!碾压LLaMA 2,性能直逼GPT-4

Falcon 180B is available to get started

At the same time, the researchers also released the chat dialogue model Falcon-180B-Chat. The model is fine-tuned on conversation and instruction datasets covering Open-Platypus, UltraChat and Airoboros.

1800亿参数,世界顶级开源大模型Falcon官宣!碾压LLaMA 2,性能直逼GPT-4

Now, everyone can have a demo experience.

1800亿参数,世界顶级开源大模型Falcon官宣!碾压LLaMA 2,性能直逼GPT-4

Address: https://huggingface.co/tiiuae/falcon-180B-chat

Prompt format

The basic model does not have a prompt format because it is not a large conversational model, nor is it trained through instructions, so it does not respond in a conversational manner.

Pre-trained models are a great platform for fine-tuning, but perhaps you shouldn’t use them directly. Its dialogue model has a simple dialogue mode.

System: Add an optional system prompt hereUser: This is the user inputFalcon: This is what the model generatesUser: This might be a second turn inputFalcon: and so on
Copy after login

Transformers

Starting from Transformers 4.33, Falcon 180B can be used and downloaded in the Hugging Face ecosystem.

Make sure you are logged in to your Hugging Face account and have the latest version of transformers installed:

pip install --upgrade transformershuggingface-cli login
Copy after login

bfloat16

Here's how to use the base model in bfloat16. The Falcon 180B is a large model, so please be aware of its hardware requirements.

In this regard, the hardware requirements are as follows:

It can be seen that if you want to fully fine-tune Falcon 180B, you need at least 8X8X A100 80G, If it is just for inference, you will also need an 8XA100 80G GPU.

1800亿参数,世界顶级开源大模型Falcon官宣!碾压LLaMA 2,性能直逼GPT-4

from transformers import AutoTokenizer, AutoModelForCausalLMimport transformersimport torchmodel_id = "tiiuae/falcon-180B"tokenizer = AutoTokenizer.from_pretrained(model_id)model = AutoModelForCausalLM.from_pretrained(model_id,torch_dtype=torch.bfloat16,device_map="auto",)prompt = "My name is Pedro, I live in"inputs = tokenizer(prompt, return_tensors="pt").to("cuda")output = model.generate(input_ids=inputs["input_ids"],attention_mask=inputs["attention_mask"],do_sample=True,temperature=0.6,top_p=0.9,max_new_tokens=50,)output = output[0].to("cpu")print(tokenizer.decode(output)
Copy after login

may produce the following output:

My name is Pedro, I live in Portugal and I am 25 years old. I am a graphic designer, but I am also passionate about photography and video.I love to travel and I am always looking for new adventures. I love to meet new people and explore new places.
Copy after login

Use 8 bits and 4 bitsandbytes

# Additionally, the 8-bit and 4-bit quantized versions of Falcon 180B are virtually indistinguishable from bfloat16 in terms of evaluation!

This is good news for inference, as users can confidently use the quantized version to reduce hardware requirements.

Note that inference is much faster in the 8-bit version than in the 4-bit version. To use quantization, you need to install the "bitsandbytes" library and enable the corresponding flag when loading the model:

model = AutoModelForCausalLM.from_pretrained(model_id,torch_dtype=torch.bfloat16,**load_in_8bit=True,**device_map="auto",)
Copy after login

Dialog Model

As mentioned above, the version of the model fine-tuned for tracking conversations uses a very straightforward training template. We have to follow the same pattern to run chat-style reasoning.

For reference, you can take a look at the [format_prompt] function in the chat demo:

def format_prompt(message, history, system_prompt):prompt = ""if system_prompt:prompt += f"System: {system_prompt}\n"for user_prompt, bot_response in history:prompt += f"User: {user_prompt}\n"prompt += f"Falcon: {bot_response}\n"prompt += f"User: {message}\nFalcon:"return prompt
Copy after login

As you can see from the above, the user's interaction and model The responses are preceded by User: and Falcon: delimiters. We connect them together to form a prompt that contains the entire conversation history. This way, a system prompt can be provided to adjust the build style.

Hot comments from netizens

Many netizens have heated discussions about the true strength of Falcon 180B.

Absolutely unbelievable. It beats GPT-3.5 and is on par with Google's PaLM-2 Large. This is a game changer!

1800亿参数,世界顶级开源大模型Falcon官宣!碾压LLaMA 2,性能直逼GPT-4

A startup CEO said that I tested the Falcon-180B conversation robot and it was no better than the Llama2-70B chat system. The HF OpenLLM rankings also show mixed results. This is surprising considering its larger size and larger training set.

1800亿参数,世界顶级开源大模型Falcon官宣!碾压LLaMA 2,性能直逼GPT-4

Give a chestnut:

Give some entries, let Falcon-180B and Llama2-70B Answer them separately and see what the effect is?

Falcon-180B mistakenly counts a saddle as an animal. Llama2-70B answered concisely and gave the correct answer.

1800亿参数,世界顶级开源大模型Falcon官宣!碾压LLaMA 2,性能直逼GPT-4

1800亿参数,世界顶级开源大模型Falcon官宣!碾压LLaMA 2,性能直逼GPT-4

The above is the detailed content of 180 billion parameters, the world's top open source large model Falcon is officially announced! Crush LLaMA 2, performance is close to GPT-4. For more information, please follow other related articles on the PHP Chinese website!

Related labels:
source:51cto.com
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template