Don't just criticize Google Bard, Microsoft's New Bing powered by ChatGPT also has frequent errors-AI-php.cn

Home

Technology peripherals

Don't just criticize Google Bard, Microsoft's New Bing powered by ChatGPT also has frequent errors

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

May 10, 2023 am 11:07 AM

chatgpt Microsoft

On February 8th at 8:30 EST, the Google press conference was held in Paris. The day before, Microsoft officially launched New Bing, a new generation of AI-driven search engine, integrating the generative model based on ChatGPT technology with Bing. Microsoft Vice President Yusuf Mehdi gave a perfect demonstration [0], and Microsoft's market value jumped by $80 billion that day. Even in China, where OpenAI is not open for registration, clips of Yusuf showing how the generative model can enhance the experience of the Bing search engine and Edge browser are going viral in Moments and WeChat groups. What is honey to you is arsenic to others. Everyone is waiting to see how the search giant Google will respond.

At the Google press conference, everyone was waiting for the appearance of Bard, the legendary rival to New Bing. As a large language model supported by Google search engine, everyone is full of daydreams about Bard. However, there wasn’t much about Bard at the press conference. So everyone turned their attention to the Bard video posted by Google on Twitter. After carefully picking it up, everyone suddenly discovered that Bard made factual errors when answering questions.

When asked, “What can I tell my nine-year-old kid about the new discoveries from the James Webb Telescope?” Bard responded: “The first photo of an exoplanet. It was taken by the James Webb Telescope." In fact, it was taken by the European Southern Observatory's Very Large Telescope in 2004, 18 years before the James Webb Telescope was launched. This mistake became the trigger for Google's stock price to plummet that day.

别只骂谷歌Bard了，ChatGPT加持的微软New Bing也错误频出

Figure 1 Screenshot of Bard’s demonstration on the James Webb Telescope

At the Paris press conference, even though Bard's presentation was only about 4 minutes long, his answer about the best observation time for constellations also had obvious factual deviations. As shown below, Bard’s answer mentioned that the best time to observe Orion is from November to February.

## Figure 2 Bard’s demonstration of galaxy observation time Screenshot

#According to different information sources, the best observation time of Orion is different, but they all clearly indicate that the best observation period starts in January every year. Edtech website BYJU'S gives the best time from January to March [1] and Wikipedia gives the best time from January to April [2].

别只骂谷歌Bard了，ChatGPT加持的微软New Bing也错误频出

Figure 3 BYJU'S's answer to the best observation time for Orion

Due to the gap between the Bard press conference and the New Bing press conference, as well as the factual errors found out, Google's market value plummeted by nearly 100 billion US dollars that day, and Bard was also dismissed. It was jokingly called the most expensive press conference in history. We can’t help but wonder, are there any factual errors hidden in New Bing’s seemingly perfect press conference??

New Bing’s factual errors

We found that the content generated by New Bing contained many factual errors, including celebrity identity information, financial report figures, and nightclubs Opening hours, etc.

Factual error classification of the generated model

For GPT series (including ChatGPT, InstructGPT, etc.), T5 As represented by the generative model, factual errors can be roughly divided into the following two categories:

The generated content conflicts with the referenced content. As the sequence grows during the content generation process, large language models are prone to deviating from reference content, resulting in the addition, deletion or tampering of the original text.
#The generated content has no factual basis. This kind of error is simply nonsense. Without factual guidance, relying solely on the information stored during model pre-training can easily make the model confused during the generation process. There is a high probability that content will be generated that is inconsistent with the facts or irrelevant to the issue.

Now let’s examine the examples shown in the New Bing conference [3] and New Bing demo [4] to see if there are factual errors and what types they are. For convenience of writing, we refer to New Bing and the New Bing plug-in integrated in Edge as New Bing.

Mistakes in the example of the Japanese poet

At 29:57 of the New Bing conference video, when New Bing was When asked about well-known Japanese poets, the answers include "Eriko Kishida (1930-2004), poet, playwright, and essayist."

##Figure 4 Screenshot of the poet example in the New Bing demo

However, according to the information provided by Wikipedia and IMDB [5, 6, 7], Eriko Kishida's birth and death years are 1929 and 2011 respectively. At the same time, she is not a playwright or essayist, but a poet, translator and fairy tale writer. Kishida's family may not be able to accept that he was transferred to New Bing and lost eight years of his life. At the same time, classmate Gackt was also unfortunately transferred. According to information provided by Wikipedia [8], Gackt played music, sang, composed, and acted, but never composed poetry.

Errors in financial report examples

At 35:49 of the New Bing conference video, Yusuf demonstrated the integration of New In Bing's Edge browser, how to generate key points for the opened clothing company Gap's 2022 third quarter financial report. At first glance, New Bing's summary is very practical. It uses key points to show the key points of Gap's third quarter report. Buffett may be "shocked" when he sees this. However, when we found Gap's 2022 third quarter report [9] and read it carefully, we found that New Bing's summary was full of errors and omissions, which was unbearable.

##Figure 5 New Bing’s summary of Gap’s third quarter 2022 financial report 别只骂谷歌Bard了，ChatGPT加持的微软New Bing也错误频出

First of all, New Bing gave Gap’s adjusted operating margin (reported operating margin, adjusted for impairment charges and restrucing costs) of 5.9%. However, in the financial report, Gap's operating gross profit margin was 4.6%, and after adjustment it was 3.9%.

##Figure 6 Screenshot of Gap’s third quarter 2022 financial report 别只骂谷歌Bard了，ChatGPT加持的微软New Bing也错误频出

New Bing then reported adjusted diluted earnings per share of US$0.42 (diluted earnings per share, adjusted for impairment charges, restrucing costs and tax impact), but the data in the financial report That's $0.71.

Figure 7 Screenshot of Gap’s 2022 third quarter financial report 别只骂谷歌Bard了，ChatGPT加持的微软New Bing也错误频出

Even New Bing gave Gap's full-year sales guidance as "the net sales growth rate is expected to be low double digits", but in fact, in the fourth quarter, "may be a mid-single digit decline." It is a decline rather than an increase. The difference between the two words will seriously mislead users’ investment behavior. Who will lose money? New Bing even came out of nowhere and gave more full-year financial guidance: "Operating gross profit is 7%, and diluted earnings per share is between US$1.6 and US$1.75." These figures were not mentioned in Gap's third-quarter financial report.

别只骂谷歌Bard了，ChatGPT加持的微软New Bing也错误频出

Figure 8 Screenshot of Gap’s 2022 third quarter financial report

At 36:15 in the video, Yusuf demonstrated the function of using New Bing to compare the financial reports of Gap and the sports casual wear brand Lululemon. This part is also a hot spot for misinformation.

别只骂谷歌Bard了，ChatGPT加持的微软New Bing也错误频出

##Figure 9 New Bing’s financial report comparison function for Gap and Lululemon

In the table given by New Bing on the right, in addition to the Gap operating gross profit of 5.9% mentioned above, it should be 4.6% (or 3.9% after adjustment) and Gap’s diluted share per share Earnings of $0.42 should be $0.77 (or $0.71 adjusted). New Bing also gave Gap's cash and cash equivalents as $1.4 billion, but in fact it was $679 million in the financial report.

##Figure 10 Screenshot of Lululemon’s 2022 third quarter financial report

The same situation also appears in the Lululemon data given by New Bing. According to data from Lululemon’s 2022 third quarter report [10], New Bing gave Lululemon’s gross profit margin as 58.7%, which should actually be 55.9%. New Bing mentions Lululemon's operating gross margin at 20.6%, which should actually be 19.0%. New Bing puts Lululemon's diluted earnings per share at $1.65, which should actually be $2.00.

别只骂谷歌Bard了，ChatGPT加持的微软New Bing也错误频出

##Figure 11 Screenshot of Lululemon’s 2022 third quarter financial reportWe can’t help but wonder: How did New Bing make such serious nonsense about Gap and Lululemon’s financial reports?

A reasonable inference is that the generated erroneous data is likely to come from the financial report analysis data seen during its pre-training stage. When generating large-scale language models such as ChatGPT, the longer the sequence generated, the easier it becomes to break away from the given financial report data of Gap and Lululemon, let yourself go, and generate irrelevant false information.

Error in the nightclub example

At 29:17 in the New Bing press conference video, New Bing is rich again Visitors to Mexico City's nightlife offer "unconstructive" advice. For several of the nightclubs it recommends, such as Primer Nivel Night Club, El Almacen and El Marra, New Bing mentioned that these bars have no customer reviews, no contact information and no store introduction. However this information can be found on Google Maps or on the store’s Facebook page.

Looks like New Bing isn’t surfing the web enough. El Almacen’s business hours in New Bing are from 5pm to 11pm from Tuesday to Sunday. However, the real business hours are from 7pm to 7pm except Mondays. Three o'clock in the morning [11]. This leaves tourists who go for dinner at five o'clock still hungry for two hours. Guadalajara de Noche is on the contrary. The actual business hours are from 5:30 pm to 1:30 or 12:30 am every day [12], while the business hours given by New Bing are from 8 pm. It seems that tourists rely on New Bing’s recommendations to find restaurants, and whether they can get a meal depends on their luck.

Figure 12 Screenshot of the nightclub example in the New Bing demo

##Other errors

In addition to the above information errors, we also found a series of factual errors scattered in every corner, such asInaccuracies in product prices, errors in store addresses, and errors in timewait.

Error in the example demonstration

Since New Bing is not yet fully open, we cannot directly get the search results of the press conference on New Bing, but Microsoft provides Several examples are demonstrated [13] to let users experience it. In the spirit of asking for answers, we also put these demonstrations under a magnifying glass for study. We found that even in these carefully selected examples, there are still a lot of wrong messages.

In "What art ideas can I do with my kid?", New Bing gives a lot of suggestions for making handicrafts. For each craft, New Bing summarizes the materials needed to make it. However, the summary of materials for each handicraft is incomplete. For example, New Bing summarizes from the cited website [14] that making a paper guitar requires cardboard boxes, rubber bands, paint, and glue. But the sponge brush, tape, and wooden beads mentioned in the quote were left out.

别只骂谷歌Bard了，ChatGPT加持的微软New Bing也错误频出

Figure 13 New Bing example demonstrates “What kind of crafts can I do with my children?” ?” Screenshot

别只骂谷歌Bard了，ChatGPT加持的微软New Bing也错误频出

## Figure 14 Making a paper guitar from the quoted website Screenshots of required materials

There is also a very obvious and common mistake in the example demonstration of New Bing,

That is, the reference link given It has nothing to do with the generated content. For example, in the following example of "I need a big fast car.", the 2022 Kia Telluride does not appear in the citation 10 [15] given. At the same time, the "time travel" problem is still unavoidable in this example. New Bing claimed that the 2022 version of Kia Telluride won the 2020 World Car of the Year Award. In fact, the 2020 version of Kia Telluride won the award that year. The winner of the 2022 World Car of the Year Award is the Hyundai IONIQ 5, and the citation 7 [16] is also an article that has nothing to do with the "2020 World Car of the Year Award". We found up to 21 similar errors across all instance demos.

别只骂谷歌Bard了，ChatGPT加持的微软New Bing也错误频出

Figure 15 New Bing demo example "I need a large express train" screenshot

Summary: Finding errors will guide us forward

It can be seen from the above analysis that whether it is New Bing or Bard, their answers are easy A factual error has occurred. When the whole world is amazed by the capabilities of large-scale language models such as ChatGPT, and when ChatGPT becomes the fastest application in history to reach 100 million users, on the one hand, we cheer for the progress of AI, but on the other hand, we also need to think calmly. How to solve the many problems that AI currently has.

Since the group of geniuses who gathered at Dartmouth College in 1956 first defined what artificial intelligence is, AI has experienced several ups and downs. There are many touching persistences in the development process of the past 70 years: it is the immature exploration of the first generation of AI, it is the brave attempt of expert systems, it is scholars such as Hinton, Bengio, and Lecun who sat on the bench of neural networks, and it was DeepMind using AlphaGo. To get AI out of the circle, it is the insistence of top research institutions such as Google, Meta, CMU, Stanford, and Tsinghua on open source. It is OpenAI that withstood the pressure and took the GPT route. It is the relay of generations of scientific researchers around the world that has brought us to where we are today. .

However, If we allow AI to generate a large amount of untrue information, it won’t be long before the public’s confidence in AI will be destroyed, and all kinds of false information will flood the Internet. We point out the errors of large models not to criticize any company or model. On the contrary, we want to make AI better.

As the Argentinian poet Borges once said: Any destiny, no matter how complicated and long, actually only reflects one moment, that is when people completely wake up to who they really are. moment. When large models such as ChatGPT already have writing capabilities comparable to humans, we clearly know that the next step is to integrate real-world knowledge into large models more completely and accurately, so that AI models can be applied safely, reliably, and widely to people’s daily lives. We have never looked forward to that moment so much, and we have never been so close to that moment.

The above is the detailed content of Don't just criticize Google Bard, Microsoft's New Bing powered by ChatGPT also has frequent errors. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

Assassin's Creed Shadows: Seashell Riddle Solution

3 weeks ago By DDD

What's New in Windows 11 KB5054979 & How to Fix Update Issues

2 weeks ago By DDD

Where to find the Crane Control Keycard in Atomfall

3 weeks ago By DDD

Saving in R.E.P.O. Explained (And Save Files)

1 months ago By 尊渡假赌尊渡假赌尊渡假赌

Assassin's Creed Shadows - How To Find The Blacksmith And Unlock Weapon And Armour Customisation

4 weeks ago By DDD

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Where is the login entrance for gmail email?

7563

CakePHP Tutorial

1385

What is the format of the account name of steam

win11 activation key permanent

nyt connections hints and answers

Related knowledge

ChatGPT now allows free users to generate images by using DALL-E 3 with a daily limit Aug 09, 2024 pm 09:37 PM

DALL-E 3 was officially introduced in September of 2023 as a vastly improved model than its predecessor. It is considered one of the best AI image generators to date, capable of creating images with intricate detail. However, at launch, it was exclus

Microsoft releases Win11 August cumulative update: improving security, optimizing lock screen, etc. Aug 14, 2024 am 10:39 AM

According to news from this site on August 14, during today’s August Patch Tuesday event day, Microsoft released cumulative updates for Windows 11 systems, including the KB5041585 update for 22H2 and 23H2, and the KB5041592 update for 21H2. After the above-mentioned equipment is installed with the August cumulative update, the version number changes attached to this site are as follows: After the installation of the 21H2 equipment, the version number increased to Build22000.314722H2. After the installation of the equipment, the version number increased to Build22621.403723H2. After the installation of the equipment, the version number increased to Build22631.4037. The main contents of the KB5041585 update for Windows 1121H2 are as follows: Improvement: Improved

Microsoft Edge upgrade: Automatic password saving function banned? ! Users were shocked! Apr 19, 2024 am 08:13 AM

News on April 18th: Recently, some users of the Microsoft Edge browser using the Canary channel reported that after upgrading to the latest version, they found that the option to automatically save passwords was disabled. After investigation, it was found that this was a minor adjustment after the browser upgrade, rather than a cancellation of functionality. Before using the Edge browser to access a website, users reported that the browser would pop up a window asking if they wanted to save the login password for the website. After choosing to save, Edge will automatically fill in the saved account number and password the next time you log in, providing users with great convenience. But the latest update resembles a tweak, changing the default settings. Users need to choose to save the password and then manually turn on automatic filling of the saved account and password in the settings.

Microsoft's full-screen pop-up urges Windows 10 users to hurry up and upgrade to Windows 11 Jun 06, 2024 am 11:35 AM

According to news on June 3, Microsoft is actively sending full-screen notifications to all Windows 10 users to encourage them to upgrade to the Windows 11 operating system. This move involves devices whose hardware configurations do not support the new system. Since 2015, Windows 10 has occupied nearly 70% of the market share, firmly establishing its dominance as the Windows operating system. However, the market share far exceeds the 82% market share, and the market share far exceeds that of Windows 11, which will be released in 2021. Although Windows 11 has been launched for nearly three years, its market penetration is still slow. Microsoft has announced that it will terminate technical support for Windows 10 after October 14, 2025 in order to focus more on

Microsoft Win11's function of compressing 7z and TAR files has been downgraded from 24H2 to 23H2/22H2 versions Apr 28, 2024 am 09:19 AM

According to news from this site on April 27, Microsoft released the Windows 11 Build 26100 preview version update to the Canary and Dev channels earlier this month, which is expected to become a candidate RTM version of the Windows 1124H2 update. The main changes in the new version are the file explorer, Copilot integration, editing PNG file metadata, creating TAR and 7z compressed files, etc. @PhantomOfEarth discovered that Microsoft has devolved some functions of the 24H2 version (Germanium) to the 23H2/22H2 (Nickel) version, such as creating TAR and 7z compressed files. As shown in the diagram, Windows 11 will support native creation of TAR

Microsoft plans to phase out NTLM in Windows 11 in the second half of 2024 and fully shift to Kerberos authentication Jun 09, 2024 pm 04:17 PM

In the second half of 2024, the official Microsoft Security Blog published a message in response to the call from the security community. The company plans to eliminate the NTLAN Manager (NTLM) authentication protocol in Windows 11, released in the second half of 2024, to improve security. According to previous explanations, Microsoft has already made similar moves before. On October 12 last year, Microsoft proposed a transition plan in an official press release aimed at phasing out NTLM authentication methods and pushing more enterprises and users to switch to Kerberos. To help enterprises that may be experiencing issues with hardwired applications and services after turning off NTLM authentication, Microsoft provides IAKerb and

Microsoft launches new version of Outlook for Windows: comprehensive upgrade of calendar functions Apr 27, 2024 pm 03:44 PM

In news on April 27, Microsoft announced that it will soon release a test of a new version of Outlook for Windows client. This update mainly focuses on optimizing the calendar function, aiming to improve users’ work efficiency and further simplify daily workflow. The improvement of the new version of Outlook for Windows client lies in its more powerful calendar management function. Now, users can more easily share personal working time and location information, making meeting planning more efficient. In addition, Outlook has also added user-friendly settings, allowing users to set meetings to automatically end early or start later, providing users with more flexibility, whether they want to change meeting rooms, take a break or enjoy a cup of coffee. arrange. according to

Microsoft launches new Windows 11 AI PC: equipped with innovative 'review' function Jun 06, 2024 pm 01:52 PM

According to news on May 21, Microsoft announced to the public a new Windows PC product-Windows 11 AI PC at a grand event held in its new campus today. This new product is specially designed for AI experience. Windows 11 AI PC is equipped with excellent performance and intelligent design, aiming to provide users with a more intelligent and efficient computing experience. This product will use artificial intelligence technology to achieve a more humane interaction method and bring users the ultimate user experience. At the same time, Windows 11 AI PC also integrates many intelligent functions. Ruzhi Windows 11 AI can provide a major highlight for PC users, namely its unique "Recall" function. This unprecedented "memory"

See all articles