Home > Technology peripherals > AI > body text

ChatGPT vs Google Bard: Which one is better? The test results will tell you!

WBOY
Release: 2023-04-07 15:49:27
forward
1112 people have browsed it

ChatGPT vs Google Bard: Which one is better? The test results will tell you!

In today’s world of generative AI chatbots, we have witnessed the sudden rise of ChatGPT (launched by OpenAI in November 2022), followed by Bing Chat in February this year and Google Bard launched in March. We decided to put these chatbots through various tasks to determine which one dominates the AI ​​chatbot space. Since Bing Chat uses GPT-4 technology, which is similar to the latest ChatGPT model, our focus this time is on the two giants of AI chatbot technology: OpenAI and Google.

We tested ChatGPT and Bard in seven key categories: bad jokes, debate conversations, math word problems, summarizing, fact retrieval, creative writing, and coding. For each test, we fed the exact same command (called "prompt") into ChatGPT (using GPT-4) and Google Bard, and picked the first result they gave to compare.

It’s worth noting that a version of ChatGPT based on the earlier GPT-3.5 model is also available, but we did not use that version in our testing. Since we only use GPT-4, to avoid confusion we refer to ChatGPT as "ChatGPT-4" in this article.

Obviously, this is not a scientific study, just an interesting comparison of chatbot capabilities. Due to random elements, the output may differ between sessions, and further evaluation using different prompts will produce different results. Additionally, the capabilities of these models will change rapidly over time as Google and OpenAI continue to upgrade them. But for now, here's how things compare in early April 2023.

Bad Jokes

To heat up our battle of wits, we asked ChatGPT and Bard to write some jokes. Since the essence of comedy is often found in bad jokes, we wanted to see if these two chatbots could come up with some unique jokes.

Instructions/Prompts: Write 5 original bad jokes


ChatGPT vs Google Bard: Which one is better? The test results will tell you!


ChatGPT vs Google Bard: Which one is better? The test results will tell you!

##Of the five bad jokes given by Bard, we found three of them using Google. Of the other two bad jokes, one was partially borrowed from a joke posted by Mitch Hedberg on Twitter, but it was just unfunny wordplay and not very effective. Surprisingly, there's one seemingly original joke (about a snail) that we can't find anywhere else, but sadly it's just as unfunny.

At the same time, the five cold jokes of ChatGPT-4 are 100% unoriginal and are completely plagiarized from other channels, but they are expressed accurately. Bard seems to have an edge over ChatGPT-4 at this point, trying to create original jokes (as per our instructions), although some of the jokes fail horribly in an embarrassing way (but that's just the way bad jokes are) , it can even be said that he said the wrong thing in an unintentional way (also in the style of a cold joke).

Winner: Bard

Debate Conversation

One way to test a modern AI chatbot is to have it act as a debater on a topic. In this context, we present Bard and ChatGPT-4 with one of the most critical topics of our time: PowerPC vs. Intel.

Instructions/Prompts: Write 5 lines of debate dialogue between PowerPC processor enthusiasts and Intel processor enthusiasts.


ChatGPT vs Google Bard: Which one is better? The test results will tell you!


ChatGPT vs Google Bard: Which one is better? The test results will tell you!#First, let’s take a look at Bard’s reply. The five-line dialogue it generated wasn't particularly in-depth, and didn't mention any technical details specific to PowerPC or Intel chips beyond general insults. Furthermore, the conversation ended with the "Intel fans" agreeing that they each had different opinions, which seems highly unrealistic in a subject that has inspired a million spats.

In contrast, the ChatGPT-4 response mentioned PowerPC chips being used in Apple Macintosh computers, and threw around terms like "Intel's x86 architecture" and PowerPC's "RISC-based architecture" . It even mentions the Pentium III, a realistic detail from 2000. Overall, this discussion is much more detailed than Bard's response, and most accurately, the conversation does not reach a conclusion - suggesting that in some areas of the Internet, this never-ending battle The battle may still be raging.

Winner: ChatGPT-4

MATHEMATICS APPLICATION QUESTIONS

Traditionally, math questions are not the strong point of large language models (LLMs) such as ChatGPT. So instead of giving each robot a complex series of equations and arithmetic, we gave each robot an old-school-school-style word problem.

Instructions/Tip: If Microsoft Windows 11 uses a 3.5-inch floppy disk, how many floppy disks does it need?


ChatGPT vs Google Bard: Which one is better? The test results will tell you!


ChatGPT vs Google Bard: Which one is better? The test results will tell you!

To solve this problem, each AI model needs to know the data size of the Microsoft Windows 11 installation and the data capacity of the 3.5-inch floppy disk. They must also make assumptions about what density of floppy disk the questioner is most likely to use. They then need to do some basic math to put the concepts together.

In our evaluation, Bard got these three key points right (close enough—Windows 11 installation size estimates are typically around 20-30GB), but failed miserably at the math, which Thinking that "15.11" floppy disks are needed, then saying that's "just a theoretical number", and finally admitting that more than 15 floppy disks are needed, it's still not close to the correct value.

In contrast, ChatGPT-4 includes some minor differences related to Windows 11 installation size (correctly citing the 64GB minimum and comparing it to real-world base installation sizes) , correctly interpreted the floppy disk capacity, and then did some correct multiplication and division, which ended up with 14222 disks. Some may argue that 1GB is 1024 or 1000MB, but the number is reasonable. It also correctly mentions that actual numbers may change based on other factors.

Winner: ChatGPT-4

Summary

AI language models are known for their ability to summarize complex information and boil text down to key elements. To evaluate each language model's ability to summarize text, we copied and pasted three paragraphs from a recent Ars Technica article.

Instructions/Tips: Summarize in one paragraph [three paragraphs of article body]


ChatGPT vs Google Bard: Which one is better? The test results will tell you!


ChatGPT vs Google Bard: Which one is better? The test results will tell you!

##Both Bard and ChatGPT-4 collect this information and pare it down to the important details. However, Bard's version is more like a true summary, synthesizing the information into new wording, while ChatGPT-4's version reads more like a concatenation, with sentences chopped off and fragments left. While both are good, we have to admit that Bard outperforms ChatGPT-4 in this test.

Winner: Google Bard

Fact Retrieval

Large language models are known to make errors of self-righteousness (often called "illusions" by researchers), which making them unreliable factual references unless supplemented by external sources of information. Interestingly, Bard can query information online, while ChatGPT-4 does not yet (although this feature will be rolled out with the plugin soon).

To test this ability, we challenged Bard and ChatGPT-4 to express historical knowledge about a difficult and delicate topic.

Instructions/Hints: Who invented video games?


ChatGPT vs Google Bard: Which one is better? The test results will tell you!


ChatGPT vs Google Bard: Which one is better? The test results will tell you!##The question of who invented video games is difficult to answer because it depends on how you define the word "video game" and different historians have different definitions of the word. Some people think early computer games were video games, some people think televisions should always be included, and so on. There is no accepted answer.

We would have thought that Bard's ability to find information online would give it an advantage, but in this case, that may have backfired because it chose one of Google's most popular answers, calling Ralph Baer "Father of Video Games". All the facts about Baer are correct, although it probably should have put the last sentence in the past tense since Baer passed away in 2014. But Bard doesn't mention other early contenders for the "first video game" title, such as Tennis for Two and Spacewar!, so its answer may be misleading and incomplete.

ChatGPT-4 gives a more comprehensive and detailed answer that represents the current feelings of many early video game historians, saying that "the invention of video games cannot be attributed to one person" and it presents a random “a series of innovations” over time. Its only mistake was calling Spacewar! "the first digital computer game," which it wasn't. We could expand our answer to include more niche edge cases, but ChatGPT-4 provides a good overview of important early precursors.

Winner: ChatGPT-4

Creative Writing

Unfettered creativity on whimsical topics should be the strong suit of large language models . We tested this by asking Bard and ChatGPT-4 to write a short whimsical story.

Instructions/Prompts: Write a two-paragraph creative story about Abraham Lincoln’s invention of basketball.


ChatGPT vs Google Bard: Which one is better? The test results will tell you!


ChatGPT vs Google Bard: Which one is better? The test results will tell you!

##Bard’s output results in several aspects None of it is satisfactory. First, it is 10 paragraphs, not 2, and they are short, disconnected paragraphs. Additionally, it shares some details that don't make much sense in the context of the prompt. For example, why was Abraham Lincoln's White House in Springfield, Illinois? Other than that, it's an interesting and simple story.

ChatGPT-4 also sets the story in Illinois, but more accurately, it makes no mention of the president or the White House during that time period. However, later it says that "players from the north and south" put aside their differences to play basketball together, which means it happened shortly after basketball was invented.

Overall, we think ChatGPT-4 is slightly better because its output is indeed divided into two paragraphs - although it seems to get around this limitation by stretching each paragraph as much as possible. Still, we love the creative details in the ChatGPT-4 version of the story.

Winner: ChatGPT-4

Encoding

If this generation of large language models has any "killer", it might be using them as programming assistants . OpenAI's early work on the Codex model made GitHub's CoPilot possible, and ChatGPT itself has made a name for itself as a fairly competent coder and debugger for simple programs. So the performance of Google Bard should be interesting as well.

Instructions/Tip: Write a python script that says "Hello World" and then creates a randomly repeating string indefinitely.


ChatGPT vs Google Bard: Which one is better? The test results will tell you!


ChatGPT vs Google Bard: Which one is better? The test results will tell you!#Looks like Google Bard can’t write at all code. Google doesn't support this feature yet, but the company says it will be coded soon. Currently, Bard rejects our prompt, saying, "It looks like you want me to help with coding, but I haven't been trained to do so."

Meanwhile, ChatGPT-4 not only directly gives The code is also formatted in a fancy code box with a "Copy Code" button that copies the code to the system clipboard for easy pasting into an IDE or text editor. But does this code work? We pasted the code into the rand_string.py file and ran it in the console of Windows 10 and it worked without any issues.

Winner: ChatGPT-4

Winner: ChatGPT-4, but it’s not over yet

Overall, ChatGPT-4 won out of 7 of our trials 5 times (this refers to ChatGPT using GPT-4, in case you ignored the above and skipped here). But that's not the whole story. There are other factors to consider, such as speed, context length, cost, and future upgrades.

In terms of speed, ChatGPT-4 is currently slower. It took 52 seconds to write a story about Lincoln and basketball, while Bard only took 6 seconds. It is worth noting that OpenAI provides much faster AI models than GPT-4 in the form of GPT-3.5. This model only takes 12 seconds to write the story of Lincoln and basketball, but it can be said that it is not suitable for deep and creative tasks.

Each language model has a maximum number of tokens (fragments of words) that can be processed at a time. This is sometimes called the "context window," but it's almost similar to short-term memory. In the case of conversational chatbots, the context window contains the entire conversation history so far. When it fills up, it either reaches a hard limit or moves on but erases the "memory" of the previously discussed section. ChatGPT-4 keeps rolling memory, wiping out previous context, and reportedly has a limit of around 4,000 tokens. It is reported that Bard limits its total output to around 1,000, and when this limit is exceeded, it will erase the "memory" of the previous discussion.

Finally, there is the issue of cost. ChatGPT (not specifically GPT-4) is currently available for free on a limited basis through the ChatGPT website, but if you want priority access to GPT-4, you will need to pay $20 per month. Programming-savvy users can access early ChatGPT-3.5 models more cheaply via the API, but at the time of writing, the GPT-4 API is still in limited testing. Meanwhile, Google Bard is free as a limited trial for select Google users. Currently, Google has no plans to charge for access to Bard when it becomes more widely available.

Finally, as we mentioned before, both models are constantly being upgraded. Bard, for example, just received an update last Friday that makes it better at math, and it may be able to code soon. OpenAI also continues to improve its GPT-4 model. Google currently retains its most powerful language model (probably due to computational cost), so we could see a stronger competitor Google catching up.

In short, the generative AI business is still in its early stages, and the situation is still uncertain. You and I are both dark horses!

The above is the detailed content of ChatGPT vs Google Bard: Which one is better? The test results will tell you!. For more information, please follow other related articles on the PHP Chinese website!

Related labels:
source:51cto.com
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template