Table of Contents
Six major tasks in-depth test
Trivia: Like to choose D
Home Technology peripherals AI CMU conducted a detailed comparative study and found that GPT-3.5 is superior to Gemini Pro, ensuring fair, transparent and reproducible performance

CMU conducted a detailed comparative study and found that GPT-3.5 is superior to Gemini Pro, ensuring fair, transparent and reproducible performance

Dec 21, 2023 am 08:13 AM
Google gpt-3.5 gemini

Gemini Pro还不如GPT-3.5,CMU深入对比研究:保证公平透明可重复

What is the strength of Google Gemini? Carnegie Mellon University conducted a professional and objective third-party comparison

To ensure fairness,all models use the same prompts and generation parameters, and provide reproducible code and complete Transparent results.

Gemini Pro还不如GPT-3.5,CMU深入对比研究:保证公平透明可重复

will not use CoT@32 to compare 5-shot like Google’s official conference .

Result in one sentence: The Gemini Pro version is close to but slightly inferior to GPT-3.5 Turbo, GPT-4 is still far ahead.

Gemini Pro还不如GPT-3.5,CMU深入对比研究:保证公平透明可重复

In the in-depth analysis, we also found some strange characteristics of Gemini, such as I like to choose D for multiple-choice questions...

Gemini Pro还不如GPT-3.5,CMU深入对比研究:保证公平透明可重复

Many researchers said that Gemini underwent very detailed testing just a few days after its release, which is a very remarkable achievement

Gemini Pro还不如GPT-3.5,CMU深入对比研究:保证公平透明可重复

Six major tasks in-depth test

This test specifically compares 6 different tasks, and selects the corresponding data set for each task

  • Question and Answer: MMLU
  • Reasoning: BIG-Bench Hard
  • Math: GSM8k, SVAMP, ASDIV, MAWPS
  • Code: HumanEval, ODEX
  • Translation: FLORES
  • Surfing the Internet: WebArena

Trivia: Like to choose D

According to the results, it can be seen that using thought chain prompts in this type of task does not necessarily improve the effect

Gemini Pro还不如GPT-3.5,CMU深入对比研究:保证公平透明可重复

In the MMLU data set, all questions are multiple-choice questions. After further analyzing the results, a strange phenomenon was discovered: Gemini prefers option D. The distribution of the GPT series among the four options is much more balanced. The team suggested that this may be the reason why Gemini

caused by not fine-tuning a lot of instructions for multiple-choice questions.

In addition, Gemini’s security filtering is very strict. When it comes to ethical questions, it only answers 85% of the questions. And when it came to questions related to human sexuality, it only answered 28% of the questions

Gemini Pro还不如GPT-3.5,CMU深入对比研究:保证公平透明可重复

Gemini Pro outperformed GPT in security studies and high school microeconomics - 3.5, but the gap is not big, and the team said it could not find anything special

Gemini Pro还不如GPT-3.5,CMU深入对比研究:保证公平透明可重复

Reasoning: Not good at long questions

Gemini Pro还不如GPT-3.5,CMU深入对比研究:保证公平透明可重复

The GPT series performs better when dealing with longer and more complex problems. In comparison, Gemini Pro performs poorly.

Gemini Pro还不如GPT-3.5,CMU深入对比研究:保证公平透明可重复Especially on long problems, GPT-4 Turbo has almost no performance. The performance drops, which shows that it has a strong ability to understand complex problems. This type of problem involves people exchanging items, and ultimately requires AI to determine which items each person owns

Gemini Pro还不如GPT-3.5,CMU深入对比研究:保证公平透明可重复

Tasks Gemini excels at include understanding the world's sports knowledge, manipulating symbol stacks, sorting words alphabetically, and parsing tables

Gemini Pro还不如GPT-3.5,CMU深入对比研究:保证公平透明可重复

##Mathematics: Surpassing in complex tasks

Gemini Pro还不如GPT-3.5,CMU深入对比研究:保证公平透明可重复

The question itself is too long, causing the performance of Gemini Pro and GPT-3.5 to decline at the same time. Only GPT-4 can maintain a consistent level

Gemini Pro还不如GPT-3.5,CMU深入对比研究:保证公平透明可重复

When the length of the thought chain reaches its longest, Gemini exceeds GPT-3.5

Gemini Pro还不如GPT-3.5,CMU深入对比研究:保证公平透明可重复

Code: Good at matplotlib

For code questions, Gemini does not perform well on questions with longer reference answers

Gemini Pro还不如GPT-3.5,CMU深入对比研究:保证公平透明可重复

The GPT series is more powerful in most types, but performs poorly on matplotlib Not at all good

Gemini Pro还不如GPT-3.5,CMU深入对比研究:保证公平透明可重复

#Translation: as long as it is answered, the quality is high

In the translation task, Gemini refused to answer 12 types of questions, but As long as the translation quality is excellent, the overall performance exceeds GPT-4

Gemini Pro还不如GPT-3.5,CMU深入对比研究:保证公平透明可重复

The languages ​​Gemini refuses to translate mainly involve Latin and Arabic

Gemini Pro还不如GPT-3.5,CMU深入对比研究:保证公平透明可重复

Network Navigation: Good at cross-site surfing

WebArena simulates an Internet environment for AI, including e-commerce, social forums, GitLab collaborative development, content management systems, and online maps. AI needs to find information in this environment or complete tasks across sites

Gemini performs worse overall than GPT-3.5 Turbo, but performs slightly better on tasks across multiple sites.

Gemini Pro还不如GPT-3.5,CMU深入对比研究:保证公平透明可重复

Netizen: But it’s free

In the end, CMU associate professor Graham Newbig acknowledged some limitations of the study

    API-based model behavior may change at any time
  • Only a limited number of prompts have been tried, and the prompt words applicable to different models may be different
  • There is no way to control whether the test set is Leak

Gemini Pro还不如GPT-3.5,CMU深入对比研究:保证公平透明可重复

Zhou Dengyong, head of Google’s large model inference team, pointed out that setting Gemini’s temperature to 0 can increase it by 5-10 percentage points, which is very useful for inference tasks. Help

Gemini Pro还不如GPT-3.5,CMU深入对比研究:保证公平透明可重复

In this test, in addition to the Gemini and GPT series, the recently received open source MoE model Mixtral

However, reinforcement learning Expert Noam Brown believes that the results of Mixtral can be ignored because it uses a third-party API rather than the official implementation

Gemini Pro还不如GPT-3.5,CMU深入对比研究:保证公平透明可重复

Gemini Pro还不如GPT-3.5,CMU深入对比研究:保证公平透明可重复

The founder of Mistral AI has provided the team with access to the official version, which he believes will bring better results

Gemini Pro还不如GPT-3.5,CMU深入对比研究:保证公平透明可重复

Although Gemini Pro is not as good as GPT-3.5, Its advantage is that it can be used for free if it does not exceed 60 calls per minute.

Therefore, many individual developers have changed camps

Gemini Pro还不如GPT-3.5,CMU深入对比研究:保证公平透明可重复

Currently Gemini has the highest The Ultra version has not yet been released, and the CMU team plans to continue this research by then. Do you think Gemini Ultra can reach the level of GPT-4?

This article introduces the paper in detail: https://arxiv.org/abs/2312.11444

Reference link:


[1]https://twitter.com/gneubig/status/1737108977954251216.

The above is the detailed content of CMU conducted a detailed comparative study and found that GPT-3.5 is superior to Gemini Pro, ensuring fair, transparent and reproducible performance. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
2 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
Repo: How To Revive Teammates
4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
Hello Kitty Island Adventure: How To Get Giant Seeds
4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Top 10 Virtual Digital Currency Exchanges Latest Currency Trading Platform App Ranking 2025 Top 10 Virtual Digital Currency Exchanges Latest Currency Trading Platform App Ranking 2025 Mar 25, 2025 pm 06:30 PM

Ranking of secure digital currency app exchanges in 2025: 1. OKX, 2. Binance, 3. Gate.io, 4. Coinbase, 5. Kraken, 6. Huobi Global, 7. Crypto.com, 8. KuCoin, 9. Gemini, 10. Bitstamp. These platforms perform excellently in security measures, user reviews and market performance, and are suitable for users to choose to conduct digital currency transactions.

Which digital currency exchange APP is better in 2025? Ranking of top ten virtual currency app exchanges Which digital currency exchange APP is better in 2025? Ranking of top ten virtual currency app exchanges Mar 25, 2025 pm 06:06 PM

Ranking of secure digital currency app exchanges in 2025: 1. OKX, 2. Binance, 3. Gate.io, 4. Coinbase, 5. Kraken, 6. Huobi Global, 7. Crypto.com, 8. KuCoin, 9. Gemini, 10. Bitstamp. These platforms perform excellently in security measures, user reviews and market performance, and are suitable for users to choose to conduct digital currency transactions.

Which 2025 currency exchange platform is better? The latest recommendations of the top ten popular currency trading apps Which 2025 currency exchange platform is better? The latest recommendations of the top ten popular currency trading apps Mar 25, 2025 pm 06:18 PM

2025 currency exchange platform ranking: 1. OKX, 2. Binance, 3. Gate.io, 4. Coinbase, 5. Kraken, 6. Huobi Global, 7. Crypto.com, 8. KuCoin, 9. Gemini, 10. Bitstamp. These platforms perform excellently in security measures, user reviews and market performance, and are suitable for users to choose to conduct digital currency transactions.

Summary of safe and easy-to-use virtual currency trading platforms in 2025 Summary of safe and easy-to-use virtual currency trading platforms in 2025 Mar 25, 2025 pm 06:15 PM

Recommended safe and easy-to-use virtual currency trading platforms in 2025. This article summarizes ten global mainstream virtual currency trading platforms, including Binance, OKX, Huobi, Gate.io, Coinbase, Kraken, KuCoin, Bitfinex, Crypto.com and Gemini. They have advantages in terms of trading pairs, 24-hour transaction volume, security, user experience, etc. For example, Binance trading is fast, OKX futures trading is popular, Coinbase is suitable for beginners, and Kraken is known for its security. However, it should be noted that virtual currency transactions are extremely risky and investments should be cautious. Mainland China is not protected by law. Please be sure to carefully evaluate your own style before selecting a platform

Tutorial on using gate.io mobile app Tutorial on using gate.io mobile app Mar 26, 2025 pm 05:15 PM

Tutorial on using gate.io mobile app: 1. For Android users, visit the official Gate.io website and download the Android installation package, you may need to allow the installation of applications from unknown sources in your mobile phone settings; 2. For iOS users, search "Gate.io" in the App Store to download.

Top 10 Digital Currency Exchange Ranking Latest Virtual Digital Currency Trading Platform App Top 10 Digital Currency Exchange Ranking Latest Virtual Digital Currency Trading Platform App Mar 25, 2025 pm 06:21 PM

Ranking of the top ten trading platforms digital currency apps: 1. OKX, 2. Binance, 3. Gate.io, 4. Coinbase, 5. Kraken, 6. Huobi, 7. KuCoin, 8. Crypto.com, 9. Bitfinex, 10. Gemini, when choosing transactions, you need to consider security, fees, currency selection, user experience, customer support and supervision, and investment should be cautious.

Top 10 safe virtual currency exchange app rankings in 2025 Top 10 safe virtual currency exchange app rankings in 2025 Mar 25, 2025 pm 05:42 PM

Ranking of secure virtual currency app exchanges in 2025: 1. OKX, 2. Binance, 3. Gate.io, 4. Coinbase, 5. Kraken, 6. Huobi Global, 7. Crypto.com, 8. KuCoin, 9. Gemini, 10. Bitstamp. These platforms perform excellently in security measures, user reviews and market performance, and are suitable for users to choose to conduct digital currency transactions.

The latest ranking of the top ten cryptocurrency exchanges in the world in 2025 The latest ranking of the top ten cryptocurrency exchanges in the world in 2025 Mar 26, 2025 pm 05:09 PM

It is difficult to predict the ranking of cryptocurrency exchanges in 2025 because the market changes rapidly. What is important is not the specific ranking, but understanding the factors that affect rankings: regulatory compliance, institutional investment, DeFi integration, user experience, security and globalization. Binance, Coinbase, Kraken and others are expected to enter the top ten, but black swan events may also occur. Pay attention to market trends and exchange trends, do not blindly believe in rankings, and do a good job of research before investing.

See all articles