


GPT-4 becomes stupid and triggers public opinion! The quality of text code has declined, and OpenAI has just responded to questions about cost reduction and material reduction.
Large model ceiling GPT-4, has it...become stupid?
First a few users raised questions, and then a large number of netizens said they had noticed it and posted a lot of evidence.
Some people reported that they used up the 3 hours and 25 dialogue quotas of GPT-4 in one go, and still did not solve their own code problems.
I had no choice but to switch to GPT-3.5, but it solved the problem.
To summarize everyone’s feedback, the most important manifestations are:
- Before GPT-4 could write The correct code is now full of bugs
- The depth and analysis of answering questions have become less
- The response speed is faster than before
This has caused a lot of people I wonder if OpenAI is cutting corners to save costs?
Two months ago GPT-4 was the world's greatest writing assistant, and a few weeks ago it started to fall into mediocrity. I suspect they cut back on the computing power or made it less intelligent.
This inevitably reminds people of Microsoft's new Bing, which "reached its peak when it debuted", but later suffered "frontal lobotomy surgery" to change its ability. Bad things...
After netizens shared their experiences with each other, it became everyone's consensus that "it started to get worse a few weeks ago."
A storm of public opinion also formed in technical communities such as Hacker News, Reddit and Twitter.
Now the officials can’t sit still.
OpenAI Developer Promotion Ambassador Logan Kilpatrick responded to a netizen’s question:
The API will not change without us notifying you. The model there is at rest.
Worried netizens continued to ask for confirmation, "That means GPT-4 has been static since it was released on March 14, right?" ?", also received a positive answer from Logan.
"I noticed inconsistent performance for some prompt words, is it just due to the instability of the large model itself?", also got " Yes" reply.
But so far, the two questions about whether the web version of GPT-4 has been downgraded have not been answered, and Logan has not received any answers during this period. There is other content posted.
So what exactly is going on? Why not try it yourself.
As netizens generally mentioned that GPT-4’s coding skills have become worse, we conducted a simple experiment.
Has the measured GPT-4 “alchemy” ability declined?
At the end of March, we experimented with letting GPT-4 "make elixirs" and write a multi-layer perceptron in Python to implement an XOR gate.
△ShareGPT screenshot, the interface is slightly different
After changing GPT-4 to use numpy without a framework, the first time The result is wrong.
After modifying the code twice, the correct result was obtained. The first time is to modify the number of hidden neurons, and the second time is to change the activation function from sigmoid to tanh.
On June 2, we tried again to let GPT-4 complete this task, but changed to Chinese prompt words.
This time GPT-4 did not use the framework for the first time, but the code given was still wrong.
After only one modification, the correct result was obtained, and the idea was changed to the idea of directly increasing the number of training epochs and learning rate.
We searched in the OpenAI official Discord channel and found that starting from late April, sporadic users reported that GPT-4 had become worse.
The person who raised this issue earlier on Twitter was Matt Shumer, CEO of HyperWrite (a writing tool developed based on GPT API).
Putting aside the differences in people’s psychological feelings, some people also suspect that the API version and the web version are not necessarily consistent, but there is no solid evidence.
Another guess is that when the plug-in is enabled, the extra prompt words of the plug-in may be considered a kind of pollution to the problem to be solved.
△Additional prompt words in the WebPilot plug-in
This netizen said that in his opinion, the performance of GPT has deteriorated. It started after the plug-in function started public testing.
Some people also asked OpenAI employees whether the model itself has not changed, but whether the inference parameters have changed?
Qubits also accidentally "tortured" that the system prompt words of ChatGPT on iOS were not consistent with the web version.
- If you start a conversation on your mobile phone, it will know that it is interacting with you through your mobile phone.
- Will keep the answer to one or two sentences, unless a long reasoning is required.
- will not use emoticons unless you explicitly ask him to use them.
△It may not be successful, and there is a high probability of refusing to answer
Then if you continue in the web version, open it in the iOS version dialogue without realizing it, you may observe that GPT-4 answers become simpler.
In short, it is still an unsolved mystery whether GPT-4 has become dumber since its release.
But one thing is certain:
The GPT-4 that everyone started playing on March 14th was not as good as the one in the paper from the beginning.
Aligning with humans reduces AI capabilities
The more than 150-page paper published by Microsoft Research "The Spark of AGI: Early Experiments with GPT-4" clearly states:
They obtained testing qualifications before the development of GPT-4 was completed and conducted long-term testing.
Later, for many amazing examples in the paper, netizens were unable to successfully reproduce them using the public version of GPT-4.
There is currently a view in the academic community that although the subsequent RLHF training made GPT-4 more aligned with humans - that is, more obedient to human instructions and consistent with human values - it also allowed it to use its own reasoning, etc. Ability becomes worse.
One of the authors of the paper, Microsoft scientist Zhang Yi, also mentioned in the S7E11 issue of the Chinese podcast program "What's Next|Technology Knows Early":
That version of the model is better than the current one. GPT-4, which is available to everyone, is even stronger, much stronger.
For example, the Microsoft team mentioned in the paper that they let GPT-4 use TikZ in LaTeX to draw a unicorn at regular intervals to track changes in GPT-4 capabilities. .
The last result shown in the paper is quite complete.
But the first author of the paper, Sebastien Bubeck, later revealed more information when he gave a speech at MIT.
Later, when OpenAI began to focus on security issues, subsequent versions became increasingly worse at this task.
Training methods that are aligned with humans but do not reduce the upper limit of AI's own capabilities have become the research direction of many teams now, but Still in its infancy.
In addition to professional research teams, netizens who care about AI are also using their own methods to track changes in AI capabilities.
Someone asked GPT-4 to draw a unicorn once a day and record it publicly on the website.
Since April 12, I still haven’t seen the general shape of a unicorn.
Of course, the website author said that he let GPT-4 use SVG format to draw pictures, which is different from the TikZ format in the paper and has an impact.
And what I drew in April seems to be just as bad as what I draw now, and there is no obvious regression.
Finally, let me ask you, are you a GPT-4 user? Have you felt that GPT-4's capabilities have declined in recent weeks? Welcome to chat in the comment area.
Bubeck’s speech: https://www.php.cn/link/a8a5d22acb383aae55937a6936e120b0
Zhang Yi’s interview: https://www.php.cn/link/ 764f9642ebf04622c53ebc366a68c0a7
One GPT-4 unicorn every dayhttps://www.php.cn/link/7610db9e380ba9775b3c215346184a87
Reference link:
[1]https://www.php.cn/link/cd3e48b4bce1f295bd8ed1eb90eb0d85
[2]https://www.php.cn/link/fc2dc7d20994a777cfd5e6de734fe254
[3]https://www.php.cn/link/4dcfbc057e2ae8589f9bbd98b591c50a
[4]https://www.php.cn/link/0007cda84fafdcf42f96c4f4adb7f8ce
[5]https://www.php.cn/link/cd163419a5f4df0ba7e252841f95fcc1
[6]https://www.php.cn/link/afb0b97df87090596ae7c503f60bb23f
[7]https://www.php.cn/link/ef8f94395be9fd78b7d0aecf7864a03
[8]https://www.php.cn/link/30082754836bf11b2c31a0fd3cb4b091
[9]https://www.php.cn/link/14553eed6ae802daf3f8e8c10b1961f0
##
The above is the detailed content of GPT-4 becomes stupid and triggers public opinion! The quality of text code has declined, and OpenAI has just responded to questions about cost reduction and material reduction.. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics



Some users encountered errors when installing the device, prompting error code 28. In fact, this is mainly due to the driver. We only need to solve the problem of win7 driver code 28. Let’s take a look at what should be done. Do it. What to do with win7 driver code 28: First, we need to click on the start menu in the lower left corner of the screen. Then, find and click the "Control Panel" option in the pop-up menu. This option is usually located at or near the bottom of the menu. After clicking, the system will automatically open the control panel interface. In the control panel, we can perform various system settings and management operations. This is the first step in the nostalgia cleaning level, I hope it helps. Then we need to proceed and enter the system and

Imagine an artificial intelligence model that not only has the ability to surpass traditional computing, but also achieves more efficient performance at a lower cost. This is not science fiction, DeepSeek-V2[1], the world’s most powerful open source MoE model is here. DeepSeek-V2 is a powerful mixture of experts (MoE) language model with the characteristics of economical training and efficient inference. It consists of 236B parameters, 21B of which are used to activate each marker. Compared with DeepSeek67B, DeepSeek-V2 has stronger performance, while saving 42.5% of training costs, reducing KV cache by 93.3%, and increasing the maximum generation throughput to 5.76 times. DeepSeek is a company exploring general artificial intelligence

What to do with blue screen code 0x0000001? The blue screen error is a warning mechanism when there is a problem with the computer system or hardware. Code 0x0000001 usually indicates a hardware or driver failure. When users suddenly encounter a blue screen error while using their computer, they may feel panicked and at a loss. Fortunately, most blue screen errors can be troubleshooted and dealt with with a few simple steps. This article will introduce readers to some methods to solve the blue screen error code 0x0000001. First, when encountering a blue screen error, we can try to restart

The humanoid robot Ameca has been upgraded to the second generation! Recently, at the World Mobile Communications Conference MWC2024, the world's most advanced robot Ameca appeared again. Around the venue, Ameca attracted a large number of spectators. With the blessing of GPT-4, Ameca can respond to various problems in real time. "Let's have a dance." When asked if she had emotions, Ameca responded with a series of facial expressions that looked very lifelike. Just a few days ago, EngineeredArts, the British robotics company behind Ameca, just demonstrated the team’s latest development results. In the video, the robot Ameca has visual capabilities and can see and describe the entire room and specific objects. The most amazing thing is that she can also

Regarding Llama3, new test results have been released - the large model evaluation community LMSYS released a large model ranking list. Llama3 ranked fifth, and tied for first place with GPT-4 in the English category. The picture is different from other benchmarks. This list is based on one-on-one battles between models, and the evaluators from all over the network make their own propositions and scores. In the end, Llama3 ranked fifth on the list, followed by three different versions of GPT-4 and Claude3 Super Cup Opus. In the English single list, Llama3 overtook Claude and tied with GPT-4. Regarding this result, Meta’s chief scientist LeCun was very happy and forwarded the tweet and

The win10 system is a very excellent high-intelligence system. Its powerful intelligence can bring the best user experience to users. Under normal circumstances, users’ win10 system computers will not have any problems! However, it is inevitable that various faults will occur in excellent computers. Recently, friends have been reporting that their win10 systems have encountered frequent blue screens! Today, the editor will bring you solutions to different codes that cause frequent blue screens in Windows 10 computers. Let’s take a look. Solutions to frequent computer blue screens with different codes each time: causes of various fault codes and solution suggestions 1. Cause of 0×000000116 fault: It should be that the graphics card driver is incompatible. Solution: It is recommended to replace the original manufacturer's driver. 2,

Termination Code 0xc000007b While using your computer, you sometimes encounter various problems and error codes. Among them, the termination code is the most disturbing, especially the termination code 0xc000007b. This code indicates that an application cannot start properly, causing inconvenience to the user. First, let’s understand the meaning of termination code 0xc000007b. This code is a Windows operating system error code that usually occurs when a 32-bit application tries to run on a 64-bit operating system. It means it should

Blue screen is a problem we often encounter when using the system. Depending on the error code, there will be many different reasons and solutions. For example, when we encounter the problem of stop: 0x0000007f, it may be a hardware or software error. Let’s follow the editor to find out the solution. 0x000000c5 blue screen code reason: Answer: The memory, CPU, and graphics card are suddenly overclocked, or the software is running incorrectly. Solution 1: 1. Keep pressing F8 to enter when booting, select safe mode, and press Enter to enter. 2. After entering safe mode, press win+r to open the run window, enter cmd, and press Enter. 3. In the command prompt window, enter "chkdsk /f /r", press Enter, and then press the y key. 4.
