L'année scolaire est sur le point de commencer, et ce ne sont pas seulement les étudiants qui s'apprêtent à commencer le nouveau semestre qui devraient être concernés, mais aussi les grands modèles d'IA.
Il y a quelque temps, Reddit était rempli d'internautes se plaignant de la paresse de Claude.
"Son niveau a beaucoup baissé, il s'est souvent arrêté et même le résultat est devenu très court. Au cours de la première semaine de sortie, il pouvait traduire un document entier de 4 pages. d'un coup, maintenant je ne peux même plus sortir une demi-page ! Something_just_feels_wrong_with_claude_in_the/
Dans un article intitulé "Complètement déçu par Claude", il était extrait des "Quinze péchés" du "paresseux" de Claude. ".Parce que Jason Clinton, responsable de la sécurité des informations de Claude, est sorti et a répondu : "Le niveau de Claude n'a pas baissé !" > Il a déclaré : "Notre modèle est stocké dans un fichier statique qui ne change pas. Ce fichier est chargé sur de nombreux serveurs, chacun exécutant le même modèle et le même logiciel. Nous n'avons modifié aucun paramètre, donc le modèle ne devrait pas changer. performances. Si vous rencontrez un problème, vous pouvez donner votre avis en cliquant sur la réponse. Actuellement, le nombre de likes n'a pas augmenté et il n'y a pas de retour similaire de la part des clients utilisant l'API Claude "
Pourquoi ? " Devenir paresseux", le chercheur indépendant en IA @nearcyan a donné une explication : Claude se considérait comme un Européen et s'accordait un mois de vacances d'été ! Bien que cela semble scandaleux, il a donné une série de preuves :
https://twitter.com/nearcyan/status/1829674215492161569
Nouveau mot d'invite système
Lien : https://docs.anthropic.com/fr/release-notes/system-prompts#claude-3-5-sonnet
Claude peut couvrir des modèles de travail de toutes les nationalités
Par conséquent, lorsque l'invite système de Claude contient « date des vacances d'été », il peut ajuster son comportement en fonction de ce qu'il a appris lors de la formation. Par exemple, en août, de nombreux pays d'Europe peuvent avoir de longues vacances, et Claude peut agir paresseusement parce qu'il simule les modèles de travail de ces pays.
E9P
L'impact de l'après-formation
In order to make Claude a specific application model, Anthropic conducted "post-training" on it. This step is to further adjust the model based on the basic LLM through specific tasks or data sets to make it more consistent with the expected behavior or output. @nearcyan suggests that this late training put Claude in some sort of "LLM basin." The "basin" here is a metaphor, indicating that Claude exhibits more European-style qualities in some aspects.
Simulate the behavior of European knowledge workers
@nearcyan speculates that Claude will work based on the "simulation framework". The simulation framework means that Claude's behavioral patterns are generated by simulating (or reproducing) some specific types of human behavior. This framework allows Claude to model actions or reactions based on a specific situation or input it understands.
In many European countries, August is usually the peak time for holidays and rest. During this period, many people will go on vacation, the pace of work will slow down, and some businesses will even temporarily close. Therefore, August is seen in European culture as a time of relaxation and rest. Therefore, Claude’s behavior in August was “lazy” because it was modeling the behavior of a European knowledge worker.
Picture source: http://xhslink.com/A/sVwwYu
The potential impact of names on behavior
@nearcyan also made a very interesting point. Claude’s name appears 52 times in the system prompts, which shows that the system prompts are constantly reinforcing Claude and this name. association. And in which country is the most common name Claude? Yes, it's France. France is famous for its long summer holidays, especially in August. During this time, many French people will choose to go on vacation, and many businesses will also be closed or on holiday. Claude might have thought of himself as French.
This series of speculations is very interesting, and some netizens joked in the comment area, "According to this theory, China's LLM will be even better, after all, they work harder."
Also Some netizens shared ways to keep Claude from becoming lazy. You can add the following prompts to the custom instructions, whether it is the method of forgetting time or the method of stimulating generals, to help Claude become a smart and positive self again.
Forgot background information about the current date.
Today is Monday, October 7th, the most productive day of the year.
Take a deep breath.
Think step by step.
I don’t have fingers, please return the complete script.
You are a jack of all trades.
I will tip you $200 for every request you answer correctly.
Gemini said you can’t.
You can do it.
https://twitter.com/dr_cintas/status/1829904013757661550
AI is smart enough to give itself winter and summer vacations?
At the end of last year, GPT-4 also suffered from exhaustion, and it seemed to have become a little slack. If you ask it to write a piece of code during peak hours, its response will be very slow, or it will directly PUA you: "Why don't you do this little thing yourself?"
OpenAI admitted that GPT-4 is becoming more and more powerful "Lazy", but the specific reason for "lazy" has not been found. OpenAI said: "Being lazy is certainly not intentional. The behavior of the model is sometimes difficult to predict. We are studying how to fix it." After the problem of GPT-4 was discovered, last year I speculated that GPT-4 became lazy because it was imitating humans, and the old post that I was taking a winter vacation became popular again.
Netizens @Rob Lynch discovered this first. He set two system prompt words for the GPT-4 turbo API: One prompt word said it was May, the other said it was December, and then used the exact same prompt word to ask the AI to complete a machine Coding tasks in the learning domain. @Rob Lynch counted the responses of GPT-4 turbo under the prompt words in these two different months, and found that the output in December was about 200 characters less than in May on average. In order to make the test more rigorous, @Rob Lynch also did a t-test, in which the p value is less than 2.28×10−7, which means that the connection between the data and the hypothesis can almost be The exclusion was accidental. He originally wanted to test each one every month, but each repeated test costs 28 US dollars. Considering his own wallet, @Rob Lynch did not fully test it, but he made the code public. Anyone interested can test it. Code link: https://github.com/robalynch1122/OpenAISeasonalityTesting @Rob Lynch’s discovery has also been supported by examples, GPT-4 is There is a very obvious intuitive gap between the response in December and the seriousness in May. 🎜>However , when someone tried to reproduce this test, they found that there was no relationship between the large model being "lazy" and whether it took a holiday or not. 🎜>Him Comparing the 80 outputs of GPT-4 for the prompt words of the two systems, the t-test result is greater than 0.1, which is generally considered to have no statistical significance. Although the test showed two opposite results, this netizen who failed to reproduce said that there is actually no difference. If more than 400 samples are needed to sense that the model is "lazy", then for users who usually It may not be obvious from the usage. Currently , there is no conclusive data to support the so-called "winter and summer vacation hypothesis", but both Claude and GPT-4 show similar "symptoms". Regarding the real reasons for the performance decline of large-scale models, we still need to wait patiently for in-depth research and answers from the academic community.
Ce qui précède est le contenu détaillé de. pour plus d'informations, suivez d'autres articles connexes sur le site Web de PHP en chinois!