Home Technology peripherals AI The United States has spent 2.6 billion US dollars on artificial intelligence...It is expected to complete the construction of NAIRR within 6 years

The United States has spent 2.6 billion US dollars on artificial intelligence...It is expected to complete the construction of NAIRR within 6 years

Jun 03, 2023 pm 05:36 PM
AI construction expected.

Artificial intelligence is a strategic technology leading a new round of technological revolution and industrial transformation. Multiple research results and data show that the United States leads the world in basic scientific research, technological innovation, and industrial applications of artificial intelligence. Indicators such as high-level artificial intelligence papers, the number of top scholars, the number of artificial intelligence enterprises, and investment scale are all ahead of other countries.

The U.S. government attaches great importance to the innovation and development of artificial intelligence technology. According to the National AI Initiative Act of 2020, Congress requires the National Science Foundation (NSF) and the White House Office of Science and Technology Policy (OSTP) to form a working group to study and formulate the United States in January 2023. The National Artificial Intelligence Research Resource (NAIRR) infrastructure construction roadmap consolidates the United States’ competitive advantage in the field of artificial intelligence, expands opportunities for all parties in the United States to obtain key artificial intelligence and educational resources, and further drives U.S. artificial intelligence innovation and economic prosperity.

The background and significance of the construction of NAIRR in the United States

Construction background

The U.S. government believes that its leading edge in the field of artificial intelligence is being challenged and its competitive advantage is at risk of being weakened. There are mainly two problems. First, investment in artificial intelligence R&D and educational resources are unevenly distributed. Research data shows that from an investment perspective, the amount of artificial intelligence investment from the private sector in the United States has more than doubled from 2020 to 2021, but the number of new artificial intelligence companies is declining; from a talent perspective, the population and race of U.S. artificial intelligence doctoral graduates The distribution, gender distribution and the actual proportion of the population are quite different, which will restrict the innovation and development of artificial intelligence. Second, scientific research institutions have insufficient computing resources and data resources. From the perspective of computing power, the most advanced computing power platforms are owned by industry-leading private institutions, and scientific research institutions lack computing power platforms to support artificial intelligence research and development; from the perspective of data resources, the main data resources for artificial intelligence model training are owned by private institutions and large-scale Internet All platforms. Although the U.S. government continues to open data, it is still insufficient for artificial intelligence research.

The working group pointed out that the lack of sufficient artificial intelligence research resources will limit the U.S. artificial intelligence innovation ecosystem, leading to the concentration of top talents from academic research institutions to a small number of resource-rich companies. This trend, if established in the long term, will affect the competitiveness and competitiveness of the United States. Innovation. In January 2023, after 18 months of public solicitation of opinions and discussions, the working group formally proposed a construction plan and planned to apply for US$2.6 billion in construction and operation and maintenance funds. It plans to complete the NAIRR construction work in four phases within 6 years, focusing on achieving four major Goal: Gather resources to promote research innovation, enhance talent diversity, enhance basic resource capabilities, and promote the development of trusted artificial intelligence.

Significance

NAIRR, as an artificial intelligence research infrastructure, is open to American research schools, students, non-profit organizations and other institutions, providing computing resources, high-quality data, educational tools and other basic research resources. The platform is expected to become a U.S. artificial intelligence research cooperation as a key hub to consolidate its international competitive advantage.

In terms of ecological construction, the U.S. government will rely on NAIRR to unite relevant internal government departments and scientific research institutions to jointly carry out cooperative research and resource construction in the field of artificial intelligence to form a broad cooperative ecosystem. NAIRR services and functions are shown in Figure 1.

The United States has spent 2.6 billion US dollars on artificial intelligence...It is expected to complete the construction of NAIRR within 6 years

In terms of data, NAIRR will aggregate data from federal government departments and carry out data service cooperation with various institutions in the industry. The first is to promote the aggregation, development and utilization of large-scale artificial intelligence data resources. It will gather and connect the large-scale data resources that have been open sourced by US federal agencies, academic research institutions and technology giants to become the largest artificial intelligence data resource service platform in the United States. For example, the US National Institutes of Health has released more than 36PB of gene sequencing data, and the US Oceanic and Atmospheric Administration has released more than 10PB of weather and environmental data. The second is to promote the improvement of artificial intelligence data management and governance capabilities. Artificial intelligence data sets are highly fragmented. Each data set supports specialized tasks and research fields. There is a lack of unified standards for data annotation and data governance, making data management difficult. NAIRR will promote the establishment of unified standards for data aggregation, standardize data description formats, and promote the aggregation of multi-party data resources. The third is to promote the development and utilization of data resources through multi-party collaboration. The operating entity will operate the artificial intelligence data set community and encourage the community to actively develop and build valuable data resources for NAIRR to use. The operating entity will also provide data search services to facilitate querying federal agency open data and data resources from third-party service providers.

In terms of computing power, NAIRR will join forces with major U.S. artificial intelligence computing cloud platform companies to build a computing power platform, and plans to connect with the cloud platforms of technology giants such as Google, Microsoft, and Amazon, as well as the U.S. Natural Science Foundation, the U.S. National Institutes of Health, etc. Cloud platform for federal agencies. The platform provides different levels of service models and content for universities, research institutions, students, and start-ups, including a variety of services and resources such as data, computing power, test beds, and software tools. After completion, NAIRR's computing resources will include supercomputers that support at least one trillion parameter scale machine learning model training, as well as cloud computing resources, CPUs, GPUs and high-speed networks.

After the NAIRR infrastructure is established and operates stably, on the one hand, it will continue to expand cooperation with government departments and private institutions, expand the scope of platform services and users, and promote successful experiences; on the other hand, the platform will promote the formulation of relevant standards and specifications , participate in international exchanges and cooperation, serve as a basic platform for the United States and its allies and partners, and promote cooperative research and data sharing.

American NAIRR Construction Plan

The United States plans to use a systematic approach to mobilize the federal government and private institutions to work together to establish an artificial intelligence research resource infrastructure for academic research.

The first is to plan and build a platform governance system with multi-party participation. The NAIRR proposed governance structure is shown in Figure 2. The plan recommends establishing a governance system with multi-party participation from government departments, and establishing a series of responsible organizations such as a steering committee, management committee, project management office, operating entity, and advisory committee to coordinate cooperation. Establish a Steering Committee, composed of representatives from various federal government departments and agencies. It is the highest decision-making body at the national level for overall planning and strategic goals for NAIRR. It represents various departments to promote the country’s resource investment in the field of artificial intelligence. A management committee is established to guide and manage the platform operating entities, as well as provide funds and related resources. The plan proposes that NSF assume the responsibilities of the management committee. Establish a project management office to cooperate with the steering committee in daily management and evaluation of operating entities. The U.S. Congress has approved funding for the Project Management Office to support related project management, portal development and deployment, joint support, training and user support. Establish an operating entity that is independent of government departments and is responsible for formulating specific development goals for NAIRR, organizing platform construction and daily operation management, and formulating a transparent, fair and reasonable resource allocation system to meet the needs of various artificial intelligence research institutions and users. A scientific committee, technical committee, ethics committee, and user committee composed of experts in multiple fields have been established to provide decision-making support for the construction of NAIRR.

The United States has spent 2.6 billion US dollars on artificial intelligence...It is expected to complete the construction of NAIRR within 6 years

The second is to provide dedicated funds for NAIRR infrastructure operation and construction. The construction plan proposes to apply for US$2.6 billion in funding over six years, of which US$2.25 billion will be used to purchase platform computing power, software tools and data resources from service providers. The daily expenses of the operating organization will be US$370 million, and an additional US$30 million will be used for infrastructure operations. Situation assessment. All federal agencies involved in artificial intelligence research and development should participate in NAIRR’s project management. R&D investment by federal departments in the field of artificial intelligence can still be purchased and developed by each agency alone or cooperatively, but it should be managed and provided through the NAIRR infrastructure.

The third is to build NAIRR infrastructure in stages, expand computing resources as needed, and promote the aggregation of data resources. Platform construction is divided into four stages: project initiation, construction, trial operation and continuous operation. The trial operation phase will be able to support 50,000 users and can aggregate and use existing federal agency data and private agency data. After stable operation, it will support 150,000 users and establish a broader data resource cooperation community. NAIRR will develop data resources to facilitate data utilization by formulating data aggregation standards, data cooperative development, and providing data search services.

Under the new situation, the importance of building basic research resources for artificial intelligence has become increasingly prominent

Currently, new technologies and new applications of artificial intelligence are constantly emerging. The research and training of a new generation of large artificial intelligence models represented by the large language model ChatGPT require the support of larger-scale computing resources and data resources, and a single R&D investment has increased significantly. The computing power platform threshold for large artificial intelligence model training is extremely high, and ordinary institutions cannot afford huge R&D expenses and operating expenses. OpenAI research points out that the computing power required for training artificial intelligence models has increased exponentially. From 2012 to 2018, the computing power consumed in training AI models increased by 300,000 times. The computing power required to train GPT3 reaches 3640pfsday (that is, 1PetaFLOP/s efficiency runs for 3640 days), and the training cost is expected to reach US$1.4 million per time. Some organizations estimate that the initial investment cost of ChatGPT is about US$800 million.

In terms of artificial intelligence data sets, with the research and development of large pre-trained models, the size of the data sets required for training has further increased significantly. The data size has increased from millions or tens of millions in the past to hundreds of millions. The current data sets used in large model training mainly come from the Internet, including databases such as Wikipedia, social networking sites, public journals, books, papers, and codes. Some studies have pointed out that "training data will become one of the biggest constraints to the industrialization of large models. From a deeper perspective, large models still have various governance problems in terms of training data, such as data collection and labeling that is time-consuming, laborious and costly, and data quality is difficult There are insufficient guarantees and data diversification to cover the "long tail" and edge cases, and there are issues such as privacy protection and data bias in the acquisition, use and sharing of specific data." Research by foreign scholars believes that the overall scale of language data is growing at a rate of 7% ; The growth of high-quality language data is subject to factors such as population size and economic development, growing at a rate of 4% to 5%. High-quality data for training large language models will be "exhausted" by 2027.

Summary

Computing power and data resources are the basic supporting elements for artificial intelligence technology research. As artificial intelligence enters the "big model" era, computing power and data capabilities have become limiting factors for algorithm model research and training. The NAIRR infrastructure being built in the United States is conducive to solving the new challenges faced by the current innovation and development of artificial intelligence technology, and has certain reference significance for my country. my country should strengthen overall planning and coordination, accelerate the construction of computing infrastructure and data basic resources, and develop the data element market , encourage the gathering and circulation of data resources, and promote basic technology research and application innovation of artificial intelligence.

END

Author: China Academy of Information and Communications Technology Data Research Center Lu Yapeng Wang Weiguo

Editor/Format: Gai Beibei

Reviewer: Shu Wenqiong

Producer: Liu Qicheng

Likes and views are all here

The above is the detailed content of The United States has spent 2.6 billion US dollars on artificial intelligence...It is expected to complete the construction of NAIRR within 6 years. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Bytedance Cutting launches SVIP super membership: 499 yuan for continuous annual subscription, providing a variety of AI functions Bytedance Cutting launches SVIP super membership: 499 yuan for continuous annual subscription, providing a variety of AI functions Jun 28, 2024 am 03:51 AM

This site reported on June 27 that Jianying is a video editing software developed by FaceMeng Technology, a subsidiary of ByteDance. It relies on the Douyin platform and basically produces short video content for users of the platform. It is compatible with iOS, Android, and Windows. , MacOS and other operating systems. Jianying officially announced the upgrade of its membership system and launched a new SVIP, which includes a variety of AI black technologies, such as intelligent translation, intelligent highlighting, intelligent packaging, digital human synthesis, etc. In terms of price, the monthly fee for clipping SVIP is 79 yuan, the annual fee is 599 yuan (note on this site: equivalent to 49.9 yuan per month), the continuous monthly subscription is 59 yuan per month, and the continuous annual subscription is 499 yuan per year (equivalent to 41.6 yuan per month) . In addition, the cut official also stated that in order to improve the user experience, those who have subscribed to the original VIP

Context-augmented AI coding assistant using Rag and Sem-Rag Context-augmented AI coding assistant using Rag and Sem-Rag Jun 10, 2024 am 11:08 AM

Improve developer productivity, efficiency, and accuracy by incorporating retrieval-enhanced generation and semantic memory into AI coding assistants. Translated from EnhancingAICodingAssistantswithContextUsingRAGandSEM-RAG, author JanakiramMSV. While basic AI programming assistants are naturally helpful, they often fail to provide the most relevant and correct code suggestions because they rely on a general understanding of the software language and the most common patterns of writing software. The code generated by these coding assistants is suitable for solving the problems they are responsible for solving, but often does not conform to the coding standards, conventions and styles of the individual teams. This often results in suggestions that need to be modified or refined in order for the code to be accepted into the application

Can fine-tuning really allow LLM to learn new things: introducing new knowledge may make the model produce more hallucinations Can fine-tuning really allow LLM to learn new things: introducing new knowledge may make the model produce more hallucinations Jun 11, 2024 pm 03:57 PM

Large Language Models (LLMs) are trained on huge text databases, where they acquire large amounts of real-world knowledge. This knowledge is embedded into their parameters and can then be used when needed. The knowledge of these models is "reified" at the end of training. At the end of pre-training, the model actually stops learning. Align or fine-tune the model to learn how to leverage this knowledge and respond more naturally to user questions. But sometimes model knowledge is not enough, and although the model can access external content through RAG, it is considered beneficial to adapt the model to new domains through fine-tuning. This fine-tuning is performed using input from human annotators or other LLM creations, where the model encounters additional real-world knowledge and integrates it

Seven Cool GenAI & LLM Technical Interview Questions Seven Cool GenAI & LLM Technical Interview Questions Jun 07, 2024 am 10:06 AM

To learn more about AIGC, please visit: 51CTOAI.x Community https://www.51cto.com/aigc/Translator|Jingyan Reviewer|Chonglou is different from the traditional question bank that can be seen everywhere on the Internet. These questions It requires thinking outside the box. Large Language Models (LLMs) are increasingly important in the fields of data science, generative artificial intelligence (GenAI), and artificial intelligence. These complex algorithms enhance human skills and drive efficiency and innovation in many industries, becoming the key for companies to remain competitive. LLM has a wide range of applications. It can be used in fields such as natural language processing, text generation, speech recognition and recommendation systems. By learning from large amounts of data, LLM is able to generate text

Five schools of machine learning you don't know about Five schools of machine learning you don't know about Jun 05, 2024 pm 08:51 PM

Machine learning is an important branch of artificial intelligence that gives computers the ability to learn from data and improve their capabilities without being explicitly programmed. Machine learning has a wide range of applications in various fields, from image recognition and natural language processing to recommendation systems and fraud detection, and it is changing the way we live. There are many different methods and theories in the field of machine learning, among which the five most influential methods are called the "Five Schools of Machine Learning". The five major schools are the symbolic school, the connectionist school, the evolutionary school, the Bayesian school and the analogy school. 1. Symbolism, also known as symbolism, emphasizes the use of symbols for logical reasoning and expression of knowledge. This school of thought believes that learning is a process of reverse deduction, through existing

To provide a new scientific and complex question answering benchmark and evaluation system for large models, UNSW, Argonne, University of Chicago and other institutions jointly launched the SciQAG framework To provide a new scientific and complex question answering benchmark and evaluation system for large models, UNSW, Argonne, University of Chicago and other institutions jointly launched the SciQAG framework Jul 25, 2024 am 06:42 AM

Editor |ScienceAI Question Answering (QA) data set plays a vital role in promoting natural language processing (NLP) research. High-quality QA data sets can not only be used to fine-tune models, but also effectively evaluate the capabilities of large language models (LLM), especially the ability to understand and reason about scientific knowledge. Although there are currently many scientific QA data sets covering medicine, chemistry, biology and other fields, these data sets still have some shortcomings. First, the data form is relatively simple, most of which are multiple-choice questions. They are easy to evaluate, but limit the model's answer selection range and cannot fully test the model's ability to answer scientific questions. In contrast, open-ended Q&A

SOTA performance, Xiamen multi-modal protein-ligand affinity prediction AI method, combines molecular surface information for the first time SOTA performance, Xiamen multi-modal protein-ligand affinity prediction AI method, combines molecular surface information for the first time Jul 17, 2024 pm 06:37 PM

Editor | KX In the field of drug research and development, accurately and effectively predicting the binding affinity of proteins and ligands is crucial for drug screening and optimization. However, current studies do not take into account the important role of molecular surface information in protein-ligand interactions. Based on this, researchers from Xiamen University proposed a novel multi-modal feature extraction (MFE) framework, which for the first time combines information on protein surface, 3D structure and sequence, and uses a cross-attention mechanism to compare different modalities. feature alignment. Experimental results demonstrate that this method achieves state-of-the-art performance in predicting protein-ligand binding affinities. Furthermore, ablation studies demonstrate the effectiveness and necessity of protein surface information and multimodal feature alignment within this framework. Related research begins with "S

SK Hynix will display new AI-related products on August 6: 12-layer HBM3E, 321-high NAND, etc. SK Hynix will display new AI-related products on August 6: 12-layer HBM3E, 321-high NAND, etc. Aug 01, 2024 pm 09:40 PM

According to news from this site on August 1, SK Hynix released a blog post today (August 1), announcing that it will attend the Global Semiconductor Memory Summit FMS2024 to be held in Santa Clara, California, USA from August 6 to 8, showcasing many new technologies. generation product. Introduction to the Future Memory and Storage Summit (FutureMemoryandStorage), formerly the Flash Memory Summit (FlashMemorySummit) mainly for NAND suppliers, in the context of increasing attention to artificial intelligence technology, this year was renamed the Future Memory and Storage Summit (FutureMemoryandStorage) to invite DRAM and storage vendors and many more players. New product SK hynix launched last year

See all articles