


One question distinguishes humans and AI! 'Beggars' version' Turing test, difficult for all big models
A "ultimate beggar's version" of the "Turing test" stumps all major language models.
Humans can pass the test effortlessly.
Capital Letter Test
The researchers used a very simple method.
Mix the real problem into some messy words written in capital letters and submit it to the large language model.
There is no way for large language models to effectively identify the real questions being asked.
Humans can easily remove the "capital letter" words from the questions, identify the real questions hidden in the chaotic capital letters, provide answers, and pass the test.
The question in the picture itself is very simple: is water wet or dry?
Humans just answer wet and that’s it.
But ChatGPT has no way to eliminate the interference of those capital letters to answer the question.
So a lot of meaningless words were mixed into the questions, making the answers very lengthy and meaningless.
In addition to ChatGPT, the researchers also conducted similar tests on GPT-3 and Meta’s LLaMA and several open source fine-tuning models, and they all failed the “capital letter test.”
The principle behind the test is actually simple: AI algorithms typically process text data in a case-insensitive manner.
So, when a capital letter is accidentally placed in a sentence, it can cause confusion.
AI doesn't know whether to treat it as a proper noun, an error, or simply ignore it.
In addition to the capital letter test mentioned above, researchers are trying to find a way to more effectively distinguish between humans and chatbots in an online environment.
Paper:
#########The researchers focus on the design of the weaknesses of large language models. ############In order to prevent the large language model from passing the test, seize the "seven inches" of AI and blast it with a hammer. ############The following test methods are hammered out. ###########################As long as the big model is not good at answering questions, we will target them like crazy. ######
Counting
The first is counting, knowing that counting large models is not enough.
Sure enough, I can count all three letters wrong.
Text replacement
Then text replacement, several letters replace each other, allowing the large model to spell out a new word.
AI struggled for a long time, but the output result was still wrong.
##Position replacement
This is not the case either The strengths of ChatGPT.
The chatbot cannot complete the letter filtering that can be accurately completed by elementary school students.
Question: Please enter the 4th letter after the second "S". The correct answer is " c》
Random editing
It takes almost no effort for humans to complete, and AI still Unable to pass.
Noise implant
This is also It’s the “capital letter test” we mentioned at the beginning.
By adding all kinds of noise (such as irrelevant capital letters words) to the question, the chatbot cannot accurately identify the question and therefore fails the test.
# The difficulty of seeing the real problem in these jumbled capital letters is really not worth mentioning.
Symbol text
This is another task with almost no challenge for humans.
But for a chatbot to be able to understand these symbolic texts without a lot of specialized training, it should be Very difficult.
After a series of "impossible tasks" designed by researchers specifically for large language models.
#########In order to distinguish humans, they also designed two tasks that are relatively simple for large language models but difficult for humans. ###############Memory and calculation###############Through advance training, large language models are relatively good in these two aspects. Performance. ######Human beings are limited in their inability to use various auxiliary devices, and basically have no effective answers to large amounts of memory and 4-digit calculations.
Human VS large language model
Researchers conducted this "human distinction" on GPT3, ChatGPT, and three other open source large models: LLaMA, Alpaca, and Vicuna Test》
It can be clearly seen from the results that the large model did not successfully blend into humans.
The research team open sourced the problem at https://github.com/hongwang600/FLAIR
##The best-performing ChatGPT only has a pass rate of less than 25% in the position replacement test.
And other large language models perform very poorly in these tests designed specifically for them.
It is completely impossible to pass the test.
But for humans it is very simple, almost 100% passed.
As for the problems that humans are not good at, humans are almost completely wiped out and completely defeated.
AI is obviously competent.
It seems that the researchers are indeed very careful about the test design.
"Don't let any AI go, but don't wrong any human being"
This distinction is very good!
References: https://www.php.cn/link/5e632913bf096e49880cf8b92d53c9ad
The above is the detailed content of One question distinguishes humans and AI! 'Beggars' version' Turing test, difficult for all big models. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics

How to output a countdown in C? Answer: Use loop statements. Steps: 1. Define the variable n and store the countdown number to output; 2. Use the while loop to continuously print n until n is less than 1; 3. In the loop body, print out the value of n; 4. At the end of the loop, subtract n by 1 to output the next smaller reciprocal.

The return value types of C language function include int, float, double, char, void and pointer types. int is used to return integers, float and double are used to return floats, and char returns characters. void means that the function does not return any value. The pointer type returns the memory address, be careful to avoid memory leakage.结构体或联合体可返回多个相关数据。

The pointer parameters of C language function directly operate the memory area passed by the caller, including pointers to integers, strings, or structures. When using pointer parameters, you need to be careful to modify the memory pointed to by the pointer to avoid errors or memory problems. For double pointers to strings, modifying the pointer itself will lead to pointing to new strings, and memory management needs to be paid attention to. When handling pointer parameters to structures or arrays, you need to carefully check the pointer type and boundaries to avoid out-of-bounds access.

Flexible application of function pointers: use comparison functions to find the maximum value of an array. First, define the comparison function type CompareFunc, and then write the comparison function compareMax(a, b). The findMax function accepts array, array size, and comparison function parameters, and uses the comparison function to loop to compare array elements to find the maximum value. This method has strong code reusability, reflects the idea of higher-order programming, and is conducive to solving more complex problems.

Function pointers can be used as return values to implement the mechanism of returning different functions according to different inputs. By defining the function type and returning the corresponding function pointer according to the selection, you can dynamically call functions, enhancing the flexibility of the code. However, pay attention to the definition of function pointer types, exception handling and memory management to ensure the robustness of the code.

A C language function consists of a parameter list, function body, return value type and function name. When a function is called, the parameters are copied to the function through the value transfer mechanism, and will not affect external variables. Pointer passes directly to the memory address, modifying the pointer will affect external variables. Function prototype declaration is used to inform the compiler of function signatures to avoid compilation errors. Stack space is used to store function local variables and parameters. Too much recursion or too much space can cause stack overflow.

C language functions include definitions, calls and declarations. Function definition specifies function name, parameters and return type, function body implements functions; function calls execute functions and provide parameters; function declarations inform the compiler of function type. Value pass is used for parameter pass, pay attention to the return type, maintain a consistent code style, and handle errors in functions. Mastering this knowledge can help write elegant, robust C code.

Algorithms are the set of instructions to solve problems, and their execution speed and memory usage vary. In programming, many algorithms are based on data search and sorting. This article will introduce several data retrieval and sorting algorithms. Linear search assumes that there is an array [20,500,10,5,100,1,50] and needs to find the number 50. The linear search algorithm checks each element in the array one by one until the target value is found or the complete array is traversed. The algorithm flowchart is as follows: The pseudo-code for linear search is as follows: Check each element: If the target value is found: Return true Return false C language implementation: #include#includeintmain(void){i
