The artificial intelligence benchmark organization MLCommons announced the establishment of the artificial intelligence safety (AIS: AI Safety) working group. AIS will develop a platform and test library from many contributors to support AI security benchmarks for different use cases.
Artificial intelligence systems have the potential to provide significant benefits to society, but they are not without risks, such as harmfulness, misinformation, and bias. As with other complex technologies, society needs industry-standard security testing to realize benefits while minimizing risks.
The new platform will allow users to select benchmarks from a test library and aggregate the results of those benchmarks into useful and easy-to-understand scores. This is similar to standards in other industries, such as Automotive Safety Test Ratings and Energy Star Ratings
An immediate priority in this effort is to support the rapid development of more rigorous and reliable AI safety testing technologies develop. The AIS Working Group will leverage the technical and operational expertise of its members and the larger AI community to help guide and create AI safety baseline technologies.
Joaquin Vanschoren, associate professor of machine learning (ML) at Eindhoven University of Technology, said: “The open and dynamic nature of the security benchmarks being developed by the broad AI community provides a good foundation for the development and development of security benchmarks. Achieving a common goal creates real incentives." "If anyone sees an unsolved security problem, they can come up with new tests. We have some of the smartest people in the world coming together to actually solve these problems, using benchmarking to mean As a result, we will have a clear understanding of which artificial intelligence models best solve security problems.”
Rewritten content: The focus is on developing security benchmarks for large language models (LLM) and It builds on the pioneering work of researchers at Stanford University's Center for Research on Fundamental Models (CRFM) and Holistic Evaluation of Language Models (HELM). In addition to building on and incorporating many security-related tests on the HELM framework, the working group also hopes that some companies will make their internal AI security tests used for proprietary purposes public and share them with the MLCommons community to accelerate the pace of innovation
Percy Liang, Director of the Basic Model Research Center, said: "We have been developing HELM, a modular evaluation framework, for about 2 years. I am very excited to work with MLCommons to use HELM for artificial intelligence security. Evaluation, this is a topic I have been thinking about for 7 years. With the rise of powerful basic models, this topic has become extremely urgent."
The AIS working group believes that with the test As mature, standard AI security benchmarks will become an important part of the AI security approach. This is consistent with responsible AI technology development and risk-based policy frameworks, such as the Voluntary Commitments on Safety, Security, and Trust made by several technology companies to the U.S. White House in July 2023, NIST’s Artificial Intelligence Risk Management Framework and the EU’s upcoming Artificial Intelligence Law .
MLCommons is committed to supporting a wide range of stakeholders in industry and academia to jointly develop shared data, tools and benchmarks to build and test artificial intelligence systems more efficiently. David Kanter, executive director of MLCommons, said: "We are very excited to work with our members. Next year, we will focus on building and promoting artificial intelligence safety benchmarks, starting with open source models, with the aim of making these benchmarks widely applicable after initial method validation. Other LLMs."
First-time participants in the AIS working group include a multidisciplinary group of artificial intelligence experts, including: Anthropic, Coactive AI, Google, Inflection, Intel, Meta, Microsoft , NVIDIA, OpenAI, Qualcomm, as well as scholars Joaquin Vanstoren of Eindhoven University of Technology, Percy Liang of Stanford University, and Bo Li of the University of Chicago. Researchers and engineers from academia and industry, as well as domain experts from civil society and the public sector, can participate in the working group. Click to read the original article to learn how to participate in the AIS Working Group.
MLCommons is a world-leading organization dedicated to building benchmarks for artificial intelligence. It is an open engineering consortium that aims to help everyone do better machine learning through the use of benchmarks and data. The origins of MLCommons can be traced to the 2018 MLPerf benchmark, which quickly evolved into a series of industry metrics used to measure machine learning performance and increase transparency of machine learning technology. MLCommons works with more than 125 members, global technology providers, academics, and researchers focused on co-building tools across the machine learning industry through benchmarks and metrics, public datasets, and best practices
The above is the detailed content of Establishing the Artificial Intelligence Security Working Group, MLCommons announced an important step. For more information, please follow other related articles on the PHP Chinese website!