In today’s world of generative AI chatbots, we have witnessed the sudden rise of ChatGPT (launched by OpenAI in November 2022), followed by Bing Chat in February this year and Google Bard launched in March. We decided to put these chatbots through various tasks to determine which one dominates the AI chatbot space. Since Bing Chat uses GPT-4 technology, which is similar to the latest ChatGPT model, our focus this time is on the two giants of AI chatbot technology: OpenAI and Google.
We tested ChatGPT and Bard in seven key categories: bad jokes, debate conversations, math word problems, summarizing, fact retrieval, creative writing, and coding. For each test, we fed the exact same command (called "prompt") into ChatGPT (using GPT-4) and Google Bard, and picked the first result they gave to compare.
It’s worth noting that a version of ChatGPT based on the earlier GPT-3.5 model is also available, but we did not use that version in our testing. Since we only use GPT-4, to avoid confusion we refer to ChatGPT as "ChatGPT-4" in this article.
Obviously, this is not a scientific study, just an interesting comparison of chatbot capabilities. Due to random elements, the output may differ between sessions, and further evaluation using different prompts will produce different results. Additionally, the capabilities of these models will change rapidly over time as Google and OpenAI continue to upgrade them. But for now, here's how things compare in early April 2023.
To heat up our battle of wits, we asked ChatGPT and Bard to write some jokes. Since the essence of comedy is often found in bad jokes, we wanted to see if these two chatbots could come up with some unique jokes.
Instructions/Prompts: Write 5 original bad jokes
##Of the five bad jokes given by Bard, we found three of them using Google. Of the other two bad jokes, one was partially borrowed from a joke posted by Mitch Hedberg on Twitter, but it was just unfunny wordplay and not very effective. Surprisingly, there's one seemingly original joke (about a snail) that we can't find anywhere else, but sadly it's just as unfunny. At the same time, the five cold jokes of ChatGPT-4 are 100% unoriginal and are completely plagiarized from other channels, but they are expressed accurately. Bard seems to have an edge over ChatGPT-4 at this point, trying to create original jokes (as per our instructions), although some of the jokes fail horribly in an embarrassing way (but that's just the way bad jokes are) , it can even be said that he said the wrong thing in an unintentional way (also in the style of a cold joke). Winner: BardDebate ConversationOne way to test a modern AI chatbot is to have it act as a debater on a topic. In this context, we present Bard and ChatGPT-4 with one of the most critical topics of our time: PowerPC vs. Intel. Instructions/Prompts: Write 5 lines of debate dialogue between PowerPC processor enthusiasts and Intel processor enthusiasts.
#First, let’s take a look at Bard’s reply. The five-line dialogue it generated wasn't particularly in-depth, and didn't mention any technical details specific to PowerPC or Intel chips beyond general insults. Furthermore, the conversation ended with the "Intel fans" agreeing that they each had different opinions, which seems highly unrealistic in a subject that has inspired a million spats.
In contrast, the ChatGPT-4 response mentioned PowerPC chips being used in Apple Macintosh computers, and threw around terms like "Intel's x86 architecture" and PowerPC's "RISC-based architecture" . It even mentions the Pentium III, a realistic detail from 2000. Overall, this discussion is much more detailed than Bard's response, and most accurately, the conversation does not reach a conclusion - suggesting that in some areas of the Internet, this never-ending battle The battle may still be raging.
Winner: ChatGPT-4
Traditionally, math questions are not the strong point of large language models (LLMs) such as ChatGPT. So instead of giving each robot a complex series of equations and arithmetic, we gave each robot an old-school-school-style word problem.
Instructions/Tip: If Microsoft Windows 11 uses a 3.5-inch floppy disk, how many floppy disks does it need?
To solve this problem, each AI model needs to know the data size of the Microsoft Windows 11 installation and the data capacity of the 3.5-inch floppy disk. They must also make assumptions about what density of floppy disk the questioner is most likely to use. They then need to do some basic math to put the concepts together.
In our evaluation, Bard got these three key points right (close enough—Windows 11 installation size estimates are typically around 20-30GB), but failed miserably at the math, which Thinking that "15.11" floppy disks are needed, then saying that's "just a theoretical number", and finally admitting that more than 15 floppy disks are needed, it's still not close to the correct value.
In contrast, ChatGPT-4 includes some minor differences related to Windows 11 installation size (correctly citing the 64GB minimum and comparing it to real-world base installation sizes) , correctly interpreted the floppy disk capacity, and then did some correct multiplication and division, which ended up with 14222 disks. Some may argue that 1GB is 1024 or 1000MB, but the number is reasonable. It also correctly mentions that actual numbers may change based on other factors.
Winner: ChatGPT-4
AI language models are known for their ability to summarize complex information and boil text down to key elements. To evaluate each language model's ability to summarize text, we copied and pasted three paragraphs from a recent Ars Technica article.
Instructions/Tips: Summarize in one paragraph [three paragraphs of article body]
##The question of who invented video games is difficult to answer because it depends on how you define the word "video game" and different historians have different definitions of the word. Some people think early computer games were video games, some people think televisions should always be included, and so on. There is no accepted answer.
We would have thought that Bard's ability to find information online would give it an advantage, but in this case, that may have backfired because it chose one of Google's most popular answers, calling Ralph Baer "Father of Video Games". All the facts about Baer are correct, although it probably should have put the last sentence in the past tense since Baer passed away in 2014. But Bard doesn't mention other early contenders for the "first video game" title, such as Tennis for Two and Spacewar!, so its answer may be misleading and incomplete.
ChatGPT-4 gives a more comprehensive and detailed answer that represents the current feelings of many early video game historians, saying that "the invention of video games cannot be attributed to one person" and it presents a random “a series of innovations” over time. Its only mistake was calling Spacewar! "the first digital computer game," which it wasn't. We could expand our answer to include more niche edge cases, but ChatGPT-4 provides a good overview of important early precursors.
Winner: ChatGPT-4
Unfettered creativity on whimsical topics should be the strong suit of large language models . We tested this by asking Bard and ChatGPT-4 to write a short whimsical story.
Instructions/Prompts: Write a two-paragraph creative story about Abraham Lincoln’s invention of basketball.
#Looks like Google Bard can’t write at all code. Google doesn't support this feature yet, but the company says it will be coded soon. Currently, Bard rejects our prompt, saying, "It looks like you want me to help with coding, but I haven't been trained to do so."
Meanwhile, ChatGPT-4 not only directly gives The code is also formatted in a fancy code box with a "Copy Code" button that copies the code to the system clipboard for easy pasting into an IDE or text editor. But does this code work? We pasted the code into the rand_string.py file and ran it in the console of Windows 10 and it worked without any issues.
Winner: ChatGPT-4
Winner: ChatGPT-4, but it’s not over yet
In terms of speed, ChatGPT-4 is currently slower. It took 52 seconds to write a story about Lincoln and basketball, while Bard only took 6 seconds. It is worth noting that OpenAI provides much faster AI models than GPT-4 in the form of GPT-3.5. This model only takes 12 seconds to write the story of Lincoln and basketball, but it can be said that it is not suitable for deep and creative tasks.
Each language model has a maximum number of tokens (fragments of words) that can be processed at a time. This is sometimes called the "context window," but it's almost similar to short-term memory. In the case of conversational chatbots, the context window contains the entire conversation history so far. When it fills up, it either reaches a hard limit or moves on but erases the "memory" of the previously discussed section. ChatGPT-4 keeps rolling memory, wiping out previous context, and reportedly has a limit of around 4,000 tokens. It is reported that Bard limits its total output to around 1,000, and when this limit is exceeded, it will erase the "memory" of the previous discussion.
Finally, there is the issue of cost. ChatGPT (not specifically GPT-4) is currently available for free on a limited basis through the ChatGPT website, but if you want priority access to GPT-4, you will need to pay $20 per month. Programming-savvy users can access early ChatGPT-3.5 models more cheaply via the API, but at the time of writing, the GPT-4 API is still in limited testing. Meanwhile, Google Bard is free as a limited trial for select Google users. Currently, Google has no plans to charge for access to Bard when it becomes more widely available.
Finally, as we mentioned before, both models are constantly being upgraded. Bard, for example, just received an update last Friday that makes it better at math, and it may be able to code soon. OpenAI also continues to improve its GPT-4 model. Google currently retains its most powerful language model (probably due to computational cost), so we could see a stronger competitor Google catching up.
In short, the generative AI business is still in its early stages, and the situation is still uncertain. You and I are both dark horses!
The above is the detailed content of ChatGPT vs Google Bard: Which one is better? The test results will tell you!. For more information, please follow other related articles on the PHP Chinese website!