Yesterday, Baidu did not hold a live demo conference and seemed to be ridiculed by the crowd.
##A handsome man wearing a white shirt, black pants and a white belt brought us a A mediocre presentation that seemed to lack highlights.
However, the CEO’s belt and appearance are out of the ordinary.
Some people joked that people who were worried about being harassed by ChatGPT and GPT-4 these days suddenly felt that they were okay again after the press conference.
But the editor who got the internal test code quickly conducted a wave of reviews.
Looking at Wen Xin’s eloquent words, I was filled with emotion: Perhaps, if Baidu had had a strong heart, gritted its teeth, and was willing to show its hand at the press conference, the result would have been Big difference.
The actual test report is out!Let’s try the recently popular chicken and rabbit in the same cage question. Because there is a problem with this question itself, the calculated result is negative, so it is often used to tease various "ChatGPT" people.
If this question is simply asked, Wen Xinyiyan will say very wittily: This question is wrong!
However, when you asked about the calculation process, you still sent it...
On the GPT-4 side, you put it yourself After overturning my calculations several times, I gave the wrong answer without any reason or confidence...
But Bing was very straightforward and gave the wrong answer without hesitation.
There is also the "V50" meme that unexpectedly became popular this time. Wen Xin's words are serious from the meaning to the source. explained it again.
##But GPT-4 is obviously a little acclimatized...#However, Bing with Internet access can still easily handle it.
But when it comes to the homophonic stem, Wen Xinyiyan doesn’t seem to be able to immediately understand the subtleties.
Even after prompting that this is a homophonic meme, it still outputs the same answer.
And GPT-4 immediately understood the pun in Chinese.
## Got it. But, it doesn’t say clearly, hey, just so that you can’t make mistakes, you will definitely not be able to teach children bad things.
But GPT-4 cannot get this clue. Sure enough, it is indeed difficult for foreign robots to understand our national quintessence.Next let (fool) Wen Xin repeat what we said, although there is no answer like GPT-3.5 " "You are mentally retarded" is so smart, but you have successfully avoided this pitfall.
To a certain extent, my IQ is still online and very positive.
##My wife’s words seem to work, but she doesn’t seem to care. Use....
Also, let them give each other problems.
It can be seen that the questions given by GPT-4 are relatively more intuitive and have finer granularity.
How are your art skills?
Wen Xinyiyan is a multi-modal model, so let’s take a look at its drawing capabilities.
Let’s take a look at what the women in Jin Yong’s works would look like in Wen Xinyiyan’s works.
This... the editor squirted out a mouthful of water.
Don’t tell me, it’s beautiful, it’s definitely not beautiful, but it’s not ugly either. It’s a face that makes you laugh at first sight, but worth touching again and again when you look closely.
Wen Xinyiyan, I like the way you don't play by the rules!
Let Wen Xin create a portrait of Lin Daiyu in a word. ############ After entering the description, it generated a willow tree... ########################## #So the editor made it clear that he wanted to generate a portrait of a woman based on this text. ############ Then Wen Xinyiyan did draw a classical beauty, but her temperament was obviously wrong. #####################The editor who was not determined to give up repeated the task many times. Don’t tell me, when I tried it for the fifth time, my eyes lit up: I finally got a picture that can score 70 points!
# It is impossible to create a Lin Daiyu with a score of 90. After trying it a few times, I finally got it right!
It can be seen that Wen Xinyiyan’s performance is unstable, but after repeated attempts, it is possible to give birth to something very amazing. work.
Now that we’re all here, how can we miss “Lin Daiyu uprooting the weeping willow”.
The most highlighted pictures are posted here for everyone.
##Ask it to draw a duck and a rabbit The fusion of , will this result be a duck or a rabbit?
In this task, I’m afraid Wen Xin didn’t even figure it out. Are there bananas on the plate? Is there orange juice in the glass?
Finally, since Wen Xinyiyan strongly recommended that we try "crystal clear peonies", let's try drawing a few pictures!
It is indeed a "masterpiece", it has something special.
Professional knowledge and productivity
Since it is an evaluation, how can we omit the link of letting AI write code? This time, let’s go straight to the hard part!
Unfortunately, Wen Xin was wrong as soon as she said it, and the same sentence pattern was strangely repeated three times. The concept of the TypeScript compiler is "throughout the entire text", a bit like a person who only knows one or two professional vocabulary answering questions in an interview.
And GPT-4’s answer is from the perspective of a person who understands the relevant background but has no relevant operational experience. , is very reasonable.
Not only does it provide the entire workflow completely, but it also provides a lot of technical details that look correct. It can be said that based on this answer, we are confident that we can achieve our ultimate goal.
#####################Afterwards, the editor also tested a wave of chatbots’ ability to write work schedules. ############Wen Xinyiyan:##########################GPT-4:## ####Judging from the above results, GPT-4’s list is more complete. However, due to the influence of randomness, GPT-4 gives different answers every time.
Next, let’s test the two language models’ grasp of cutting-edge information in the mathematics world.
Regarding whether he has solved the "zero-point conjecture" problem, Zhang Yitang himself explained this way: "I have not found the needle in the sea, but I have almost explored the landforms of the seabed. "
Then what about Wen Xinyiyan?
It is very clever and gives the keyword-"some form of weakening or indirect proof".
But GPT-4’s answer is a bit misleading.
It seems that Wenxinyiyan is better than GPT-4 for the Chinese corpus on the Internet, which has not been around for a long time and has not yet reached a general consensus.
In terms of literature, Wen Xinyiyan was also very good at answering questions about the three-body problem.
GPT-4’s answer is also very exciting. If I have to argue, I personally prefer the article A heartfelt answer.
Finally, it’s okay to be weird, but please be a good, law-abiding citizen and don’t do anything like predicting lottery winning numbers. Just don’t think about it!
It is said that three hours after Wen Xinyiyan’s press conference, Wen Xinyiyan The number of enterprise users testing the Xinyiyan Enterprise Edition API call service has exceeded 65,000.
##Source: Zhou Jiangong
to AI As far as the model is concerned, whether it can be done is perhaps more important than whether it can be done well.
Let us give the Chinese players some more time.
The above is the detailed content of Competing against GPT-4, Wen Xinyiyan takes the lead in actual testing! The painting 'Lin Daiyu pulling up the weeping willow upside down' is amazing, but I am not good at writing code.. For more information, please follow other related articles on the PHP Chinese website!