Data is the core of artificial intelligence. Without good data, the likelihood of developing useful AI models is slim. With this in mind, the U.S. Department of Commerce last week issued a public request for input on how to better prepare numerous public data sets for building production artificial intelligence (GenAI) models. This public request addresses a critical issue currently facing the field of artificial intelligence: the lack of high-quality, diverse public datasets. These datasets are critical for training machine learning models, fostering innovation, and driving the development of AI applications. Through this public request, the Department hopes to gather input from all parties to better understand how to build and manage public data sets. They want to hear
# The U.S. Department of Commerce issued a request for information (RFI) on April 17 asking “industry experts, researchers, civil society organizations and other members of the public” on how to develop “artificial intelligence-enabled open data sets.”
The Department of Commerce calls itself the “America Data Agency” and is responsible for collecting, storing and analyzing a variety of data about the United States, including data on the economy, population and environment. A quick search of commercial data centers reveals more than 122,000 publicly accessible data sets on topics ranging from climate and weather to patents to census information.
This section is intended as technology changes and improves over the years, and the Department has sought assistance from private industry and public agencies to keep its data management and data sharing activities up to current technology standards. Accessing data electronically through machine-readable formats or through web services and APIs are examples of adapting its data services to the times.
Now, with the GenAI revolution upon us, the sector is looking for the most suitable location of data so that it can be used to build artificial intelligence models.
Oliver Wise, chief data officer of the U.S. Department of Commerce, wrote in the information request: "Today, with the emergence of artificial intelligence technology, the Department of Commerce is facing a new technological change, which provides users with Better access to information and data. “Business is particularly interested in generative artificial intelligence (GenAI) applications, which can digest text, images, audio, video and other types of information from different sources to produce new content. GenAI and other artificial intelligence technologies present opportunities and challenges for data providers such as businesses and data users including government entities, industry, academia, and the American people.” Word count not to exceed 482
“Intelligence”. suggests that the biggest challenge facing the business sector is getting human intelligence to work “Recent AI systems are trained on large amounts of digital content and generate responses based on the context of the content.” “However, these systems do not work in a Truly 'understand' text in a meaningful way."
Future artificial intelligence systems must be able to access data that is not only machine-readable, but also "machine-understandable." Today’s AI systems are limited by their reliance on massive unstructured data stores that rely on the underlying data rather than the ability to reason and judge based on understanding.
The Department of Commerce is seeking help in sharing data taking into account these fundamental limitations of GenAI technology. It is looking for new data dissemination standards, including licensing standards, for readable and understandable data. In terms of data accessibility and retrieval, the Department wants suggestions on how to make its data more accessible, such as through APIs or "web crawlers."
There is a particular need in how to use knowledge graphs that leverage metadata to better connect human terms to data. It also hopes to gain direction on the adoption of standard ontologies such as Schema.org or NIEM, and how knowledge graphs can help "harmonize and link" ontologies and vocabularies.
The Department seeks input from the community on how to advance these data standardization efforts while maintaining the highest standards of data integrity, quality, security, and ethics.
Wise is asking interested parties to send their proposals via email with the subject line “AI-Ready Open Data Asset Information Request for Information.” We hope to receive comments or feedback on these topics before July 16.
The above is the detailed content of The U.S. Department of Commerce publicly solicits comments and suggestions on GenAI data preparation. For more information, please follow other related articles on the PHP Chinese website!