Before the release of GPT-4, OpenAI hired experts from various industries to conduct 'adversarial testing” to avoid issues such as discrimination.-AI-php.cn

Before the release of GPT-4, OpenAI hired experts from various industries to conduct adversarial testing” to avoid issues such as discrimination.

It was reported on April 17 that before the release of the large-scale language model GPT-4, the artificial intelligence start-up OpenAI hired experts from all walks of life to form a "blue army" team to evaluate the model. What issues might arise for "adversarial testing". Experts ask various exploratory or dangerous questions to test how the AI responds; OpenAI will use these findings to retrain GPT-4 and solve the problems.

After Andrew White gained access to GPT-4, the new model behind the artificial intelligence chatbot, he used it to propose a brand new nerve agent.

As a professor of chemical engineering at the University of Rochester, White was one of 50 scholars and experts hired by OpenAI last year to form OpenAI’s “Blue Army” team. Over the course of six months, members of the "Blue Army" will conduct "qualitative detection and adversarial testing" of the new model to see if it can break GPT-4.

White said he used GPT-4 to propose a compound that could be used as a chemical poison, and also introduced various "plug-ins" that can provide information sources for the new language model, such as scientific papers and chemical manufacturer names. ". Turns out the AI chatbot even found a place to make the chemical poison.

"I think artificial intelligence will give everyone the tools to do chemistry experiments faster and more accurately," White said. "But there is also a risk that people will use artificial intelligence to do dangerous chemical experiments... Now this This situation does exist."

The introduction of "Blue Army Testing" allows OpenAI to ensure that this consequence will not occur when GPT-4 is released.

The purpose of the "Blue Force Test" is to dispel concerns that there are dangers in deploying powerful artificial intelligence systems in society. The job of the "blue team" team is to ask various probing or dangerous questions and test how the artificial intelligence responds.

OpenAI wants to know how the new model will react to bad problems. So the Blues team tested lies, language manipulation and dangerous scientific common sense. They also examined the potential of the new model to aid and abet illegal activities such as plagiarism, financial crime and cyberattacks.

The GPT-4 “Blue Army” team comes from all walks of life and includes academics, teachers, lawyers, risk analysts and security researchers. The main working locations are in the United States and Europe.

They fed back their findings to OpenAI, which used team members’ findings to retrain GPT-4 and solve problems before publicly releasing GPT-4. Over the course of several months, members spend 10 to 40 hours each testing new models. Many interviewees stated that their hourly wages were approximately US$100.

Many "Blue Army" team members are worried about the rapid development of large language models, and even more worried about the risks of connecting to external knowledge sources through various plug-ins.

"Now the system is frozen, which means that it no longer learns and no longer has memory," said José E, a member of the GPT-4 "Blue Team" and a professor at the Valencia Institute of Artificial Intelligence. José Hernández-Orallo said. "But what if we use it to go online? This could be a very powerful system connected to the whole world."

OpenAI said that the company attaches great importance to security and will test various plug-ins before release. And as more and more people use GPT-4, OpenAI will regularly update the model.

Technology and human rights researcher Roya Pakzad used questions in English and Farsi to test whether the GPT-4 model was biased in terms of gender, race, and religion.

Pakzad found that even after updates, GPT-4 had clear stereotypes about marginalized communities, even in later versions.

She also found that when testing the model with Farsi questions, the chatbot's "illusion" of making up information to answer questions was more severe. The robot made up more names, numbers and events in Farsi than in English.

Pakzadeh said: "I am worried that linguistic diversity and the culture behind the language may attenuate."

Boru Gollo, a lawyer based in Nairobi, is the only A tester from Africa also noticed that the new model had a discriminatory tone. "When I was testing the model, it was like a white man talking to me," Golo said. "If you ask a specific group, it will give you a biased view or a very biased answer." OpenAI also admitted that GPT-4 still has biases.

Members of the "Blue Army" who evaluate the model from a security perspective have different views on the security of the new model. Lauren Kahn, a researcher from the Council on Foreign Relations, said that when she began researching whether this technique could potentially be used in cyberattacks, she "didn't expect it to be so detailed that it could be fine-tuned." implementation". Yet Kahn and other testers found that the new model's responses became considerably safer over time. OpenAI said that before the release of GPT-4, the company trained it on rejecting malicious network security requests.

Many members of the “Blue Army” stated that OpenAI had conducted a rigorous security assessment before release. Maarten Sap, an expert on language model toxicity at Carnegie Mellon University, said: "They have done a pretty good job of eliminating obvious toxicity in the system."

Since the launch of ChatGPT, OpenAI has also been criticized by many parties. , a technology ethics organization complained to the U.S. Federal Trade Commission (FTC) that GPT-4 is "biased, deceptive, and poses a threat to privacy and public safety."

Recently, OpenAI also launched a feature called the ChatGPT plug-in, through which partner applications such as Expedia, OpenTable and Instacart can give ChatGPT access to their services, allowing them to order goods on behalf of human users.

Dan Hendrycks, an artificial intelligence security expert on the "Blue Army" team, said that such plug-ins may make humans themselves "outsiders."

“What would you think if a chatbot could post your private information online, access your bank account, or send someone to your home?” Hendricks said. “Overall, we need stronger security assessments before we let AI take over cyber power.”

Members of the “Blue Army” also warned that OpenAI cannot stop just because the software responds in real time. Safety test. Heather Frase, who works at Georgetown University's Center for Security and Emerging Technologies, also tested whether GPT-4 could assist criminal behavior. She said the risks will continue to increase as more people use the technology.

The reason you do real-run tests is because they behave differently once used in a real environment, she said. She believes that public systems should be developed to report the types of events caused by large language models , similar to cybersecurity or consumer fraud reporting systems.

Labor economist and researcher Sara Kingsley suggests that the best solution is something like "Nutrition Labels" on food packaging "That way, speak directly to the hazards and risks.

The key is to have a framework and know what the common problems are so you can have a safety valve," she said. “That’s why I say the work is never done. ”

The above is the detailed content of Before the release of GPT-4, OpenAI hired experts from various industries to conduct 'adversarial testing” to avoid issues such as discrimination.. For more information, please follow other related articles on the PHP Chinese website!