GPT-4 was exposed as cheating! LeCun calls for caution when testing on training set, chihuahua or muffin order confusion leads to errors-AI-php.cn

Table of Contents

Can’t tell the difference between Teddy and fried chicken

Visual illusion has become a popular direction

Home

GPT-4 was exposed as cheating! LeCun calls for caution when testing on training set, chihuahua or muffin order confusion leads to errors

PHPz

Nov 13, 2023 pm 08:17 PM

bard gpt-4v llava

GPT-4 solved the famous Internet meme "Chihuahua or blueberry muffin", which once amazed countless people.

However, now it is accused of "cheating"!

GPT-4 was exposed as cheating! LeCun calls for caution when testing on training set, chihuahua or muffin order confusion leads to errors Pictures

The pictures that appear in the original question are all used, but the order and arrangement are messed up.

The latest version of GPT-4 is famous for its all-in-one feature. Surprisingly, however, it made errors in the number of images it recognized, and even the Chihuahua, which was originally correctly recognized, also had recognition errors

GPT-4 was exposed as cheating! LeCun calls for caution when testing on training set, chihuahua or muffin order confusion leads to errors Pictures

What is the reason why GPT-4 performs well on the original image?

According to UCSC Assistant Professor Xin Eric Wang’s speculation, the reason for conducting this test is because the original images on the Internet are too popular. He believes that GPT-4 has encountered the original answers many times during the training process and successfully memorized them

LeCun, one of the three Turing Award winners, also paid attention to this matter and said:

Be careful about testing on the training set.

GPT-4 was exposed as cheating! LeCun calls for caution when testing on training set, chihuahua or muffin order confusion leads to errors Picture

Can’t tell the difference between Teddy and fried chicken

How popular is the original picture, not only on the Internet The famous problem has even become a classic problem in the field of computer vision, and has appeared many times in related paper research.

GPT-4 was exposed as cheating! LeCun calls for caution when testing on training set, chihuahua or muffin order confusion leads to errors Picture

Many netizens have proposed their own test plans regarding the areas where GPT-4’s capabilities are limited, regardless of the impact of the original image

In order to rule out whether the arrangement is too complicated and has any impact, some people changed it to a simple 3x3 arrangement and made a lot of mistakes.

GPT-4 was exposed as cheating! LeCun calls for caution when testing on training set, chihuahua or muffin order confusion leads to errors Pictures

Someone took out some of the pictures and sent them to GPT separately- 4, got a 5/5 accuracy rate.

GPT-4 was exposed as cheating! LeCun calls for caution when testing on training set, chihuahua or muffin order confusion leads to errors Picture

Xin Eric Wang believes that putting these easily confused images together is at the heart of this challenge

GPT-4 was exposed as cheating! LeCun calls for caution when testing on training set, chihuahua or muffin order confusion leads to errors Picture

In the end, someone successfully used the two key techniques of letting the artificial intelligence "take a deep breath" and "think step by step" at the same time, and got the correct results

GPT-4 was exposed as cheating! LeCun calls for caution when testing on training set, chihuahua or muffin order confusion leads to errors Picture

GPT-4's wording in the answer "This is an example of a visual pun or a famous meme" also reveals that the original image may indeed exist in the training data. Rephrased as follows: However, GPT-4 used in its answer: "This is an example of a visual pun or a famous meme", which also reveals that the original image may indeed exist in the training data

GPT-4 was exposed as cheating! LeCun calls for caution when testing on training set, chihuahua or muffin order confusion leads to errors Picture

Finally, someone also tested the "Teddy or fried chicken" test that often appears together, and found that GPT-4 cannot distinguish well.

GPT-4 was exposed as cheating! LeCun calls for caution when testing on training set, chihuahua or muffin order confusion leads to errors Picture

This "blueberry or chocolate bean" is a bit too much...

GPT-4 was exposed as cheating! LeCun calls for caution when testing on training set, chihuahua or muffin order confusion leads to errors Picture

Visual illusion has become a popular direction

The "nonsense" of large models is called an illusion problem in academia, multi-modal large models The problem of visual hallucinations has become a hot research direction recently.

In a study at EMNLP 2023, we created the GVIL dataset, which contains 1,600 data points, and conducted a systematic evaluation of the problem of visual illusions

GPT-4 was exposed as cheating! LeCun calls for caution when testing on training set, chihuahua or muffin order confusion leads to errors Picture

Studies show that larger scale models are more susceptible to illusions and are closer to human perception

GPT-4 was exposed as cheating! LeCun calls for caution when testing on training set, chihuahua or muffin order confusion leads to errors Picture

Another recent study focuses on assessing two types of illusions: bias and interference

GPT-4 was exposed as cheating! LeCun calls for caution when testing on training set, chihuahua or muffin order confusion leads to errors Picture

Bias refers to model tendencies Certain types of responses may be caused by imbalances in the training data.
Interference may occur due to the way the text prompt is worded or the way the input image is presented.

GPT-4 was exposed as cheating! LeCun calls for caution when testing on training set, chihuahua or muffin order confusion leads to errors Picture

The study pointed out that GPT-4V often gets confused when interpreting multiple images together, and performs better when sending images separately, consistent with Observations from the “Chihuahua or Waffle” test.

GPT-4 was exposed as cheating! LeCun calls for caution when testing on training set, chihuahua or muffin order confusion leads to errors Picture

Popular mitigation measures, such as self-correction and thought chain prompts, do not effectively solve these problems, and testing shows that LLaVA and Bard, etc. Modal models also have similar problems

In addition, research also found that GPT-4V is better at interpreting images with Western cultural backgrounds or images with English text.

For example, GPT-4V can correctly count the seven dwarfs Snow White, but it counts the seven gourd dolls into 10.

GPT-4 was exposed as cheating! LeCun calls for caution when testing on training set, chihuahua or muffin order confusion leads to errors Picture

Reference link: [1]https://twitter.com/xwang_lk/status/1723389615254774122[2]https://arxiv. org/abs/2311.00047[3]https://arxiv.org/abs/2311.03287

The above is the detailed content of GPT-4 was exposed as cheating! LeCun calls for caution when testing on training set, chihuahua or muffin order confusion leads to errors. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)

4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

R.E.P.O. Best Graphic Settings

4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Assassin's Creed Shadows: Seashell Riddle Solution

2 weeks ago By DDD

R.E.P.O. How to Fix Audio if You Can't Hear Anyone

4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

R.E.P.O. Chat Commands and How to Use Them

4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Where is the login entrance for gmail email?

7525

CakePHP Tutorial

1378

What is the format of the account name of steam

win11 activation key permanent

nyt connections hints and answers

Related knowledge

GPT-4 was exposed as cheating! LeCun calls for caution when testing on training set, chihuahua or muffin order confusion leads to errors Nov 13, 2023 pm 08:17 PM

GPT-4 solved the famous Internet meme "Chihuahua or blueberry waffle", which once amazed countless people. However, now it is being accused of "cheating"! The pictures are all from the ones that appear in the original title, but the order and arrangement are messed up. The latest version of GPT-4 is famous for its all-in-one feature. However, surprisingly, it made errors in the number of images it recognized, and even the Chihuahua, which was originally correctly recognized, also recognized incorrect images. What is the reason why GPT-4 performs so well on the original image? According to XinEricWang, assistant professor at UCSC, the reason for conducting this test is because the original images on the Internet are too popular. He believes that GPT-4 encountered the original answers many times during training and successfully memorized them Turing

Introducing eight free and open source large model solutions because ChatGPT and Bard are too expensive. May 08, 2023 pm 10:13 PM

1. The LLaMALLaMA project contains a set of basic language models with sizes ranging from 7 billion to 65 billion parameters. These models are trained on millions of tokens, and it is trained entirely on publicly available datasets. As a result, LLaMA-13B surpassed GPT-3 (175B), while LLaMA-65B performed similarly to the best models such as Chinchilla-70B and PaLM-540B. Image from LLaMA resources: Research paper: "LLaMA: OpenandEfficientFoundationLanguageModels(arxiv.org)" [https://arxiv.or

UC Berkeley successfully developed a large general visual reasoning model, and three senior scholars joined forces to participate in the research Dec 04, 2023 pm 06:25 PM

How far can you go with visual (pixel) models alone? A new paper from UC Berkeley and Johns Hopkins University explores this issue and demonstrates the potential of large vision models (LVM) on a variety of CV tasks. In recent times, large language models (LLM) such as GPT and LLaMA have become popular around the world. Building Large Vision Models (LVM) is a problem of great concern. What do we need to achieve it? The ideas provided by visual language models such as LLaVA are interesting and worth exploring, but according to the laws of the animal kingdom, we already know that visual ability and language ability are not related. For example, many experiments have shown that the visual world of non-human primates is very similar to that of humans, even though they have different language systems than humans.

Tsinghua University and Zhejiang University lead the explosion of open source visual models, and GPT-4V, LLaVA, CogAgent and other platforms bring revolutionary changes Jan 04, 2024 am 08:10 AM

Currently, GPT-4Vision shows amazing capabilities in language understanding and visual processing. However, for those looking for a cost-effective alternative without compromising performance, open source is an option with unlimited potential. Youssef Hosni is a foreign developer who provides us with three open source alternatives with absolutely guaranteed accessibility to replace GPT-4V. The three open source visual language models LLaVa, CogAgent and BakLLaVA have great potential in the field of visual processing and are worthy of our in-depth understanding. The research and development of these models can provide us with more efficient and accurate visual processing solutions. By applying these models, we can improve the graph

GPT-4 refused to accept and was overtaken by Bard: the latest model has entered the market Feb 01, 2024 pm 05:39 PM

The authoritative list of "Large Model Qualifying Competition" ChatbotArena refreshed: Google Bard surpassed GPT-4 and ranked second, second only to GPT-4 Turbo. However, many netizens expressed "dissatisfaction" and "unfair" about this. It turns out that Google AI head Jeff Dean revealed that Bard’s performance has been greatly improved because it is equipped with a new version of the large model-Gemini Pro-scale. This also means that Bard playing in "ranked matches" has networking capabilities. Netizens’ doubts revolve around this point: mixing online and offline large models on the same ranking list is extremely easy to cause misunderstanding. HuggingFace’s “Chief Alpaca Officer” Omar Sanseviero also

Bard: A new competitor to ChatGPT Nov 08, 2023 am 11:46 AM

In its continuous pursuit of optimizing the artificial intelligence user experience, Google has launched the latest and most advanced conversation system Bard.

Even Calabash Kids can't figure it out. GPT-4V, which explains League of Legends, faces hallucination challenges. Nov 13, 2023 pm 09:21 PM

Getting large models to understand both images and text can be harder than you think. After the opening of OpenAI's first developer conference, known as the "AI Spring Festival Gala", many people's circle of friends were flooded with the new products released by the company, such as GPTs, which can customize applications without writing code. GPT-4 visual API for commentating football games and even "League of Legends" games, etc. However, while everyone was praising how easy these products are to use, some people also discovered weaknesses, pointing out that powerful multi-modal models like GPT-4V actually still have great illusions, and they still have basic visual abilities. Defects, such as not being able to distinguish between similar images such as "song cake and Chihuahua", "Teddy dog and fried chicken". GPT-4V can't tell the difference between a sponge cake and a Chihuahua. Source: Xi

ChatGPT vs Google Bard (2023): In-depth comparison Jun 08, 2023 pm 05:10 PM

ChatGPT and GoogleBard are both artificial intelligence chatbots designed to generate responses to user-entered prompts. If used properly, both ChatGPT and GoogleBard can be used to support some business processes in content production and development. Read this article to learn about the features, pros, and cons of each tool and see which one is best for your business. What is ChatGPT? ChatGPT is an artificial intelligence chatbot developed by OpenAI that can generate human-like answers based on user-entered text. It has been trained on a large number of large language models. What is GoogleBard? GoogleBard is also an artificial intelligence chatbot. with ChatG

See all articles