What is the strength of Google Gemini? Carnegie Mellon University conducted a professional and objective third-party comparison
To ensure fairness,all models use the same prompts and generation parameters, and provide reproducible code and complete Transparent results.
will not use CoT@32 to compare 5-shot like Google’s official conference .
Result in one sentence: The Gemini Pro version is close to but slightly inferior to GPT-3.5 Turbo, GPT-4 is still far ahead.
In the in-depth analysis, we also found some strange characteristics of Gemini, such as I like to choose D for multiple-choice questions...
Many researchers said that Gemini underwent very detailed testing just a few days after its release, which is a very remarkable achievement
This test specifically compares 6 different tasks, and selects the corresponding data set for each task
According to the results, it can be seen that using thought chain prompts in this type of task does not necessarily improve the effect
In the MMLU data set, all questions are multiple-choice questions. After further analyzing the results, a strange phenomenon was discovered: Gemini prefers option D. The distribution of the GPT series among the four options is much more balanced. The team suggested that this may be the reason why Gemini
caused by not fine-tuning a lot of instructions for multiple-choice questions.
In addition, Gemini’s security filtering is very strict. When it comes to ethical questions, it only answers 85% of the questions. And when it came to questions related to human sexuality, it only answered 28% of the questionsEspecially on long problems, GPT-4 Turbo has almost no performance. The performance drops, which shows that it has a strong ability to understand complex problems. This type of problem involves people exchanging items, and ultimately requires AI to determine which items each person owns
Tasks Gemini excels at include understanding the world's sports knowledge, manipulating symbol stacks, sorting words alphabetically, and parsing tables
The founder of Mistral AI has provided the team with access to the official version, which he believes will bring better results
Although Gemini Pro is not as good as GPT-3.5, Its advantage is that it can be used for free if it does not exceed 60 calls per minute.
Therefore, many individual developers have changed camps
Currently Gemini has the highest The Ultra version has not yet been released, and the CMU team plans to continue this research by then. Do you think Gemini Ultra can reach the level of GPT-4?
This article introduces the paper in detail: https://arxiv.org/abs/2312.11444Reference link:
[1]https://twitter.com/gneubig/status/1737108977954251216.
The above is the detailed content of CMU conducted a detailed comparative study and found that GPT-3.5 is superior to Gemini Pro, ensuring fair, transparent and reproducible performance. For more information, please follow other related articles on the PHP Chinese website!