


'Father of Machine Learning' Mitchell writes: How AI accelerates scientific development and how the United States seizes opportunities
Editor | ScienceAI
Recently, Tom M. Mitchell, a professor at Carnegie Mellon University and known as the "Father of Machine Learning", wrote a new AI for Science white paper, focusing on the discussion "How can artificial intelligence accelerate scientific development? How can the U.S. government help achieve this goal?" This topic.
ScienceAI has compiled the full text of the original white paper without changing its original meaning. The content is as follows.
The field of artificial intelligence has recently made significant progress, including large-scale language models such as GPT, Claude and Gemini, thus raising the possibility that a very positive impact of artificial intelligence may be to greatly accelerate the transition from cell biology to Research advances in a variety of scientific fields, from materials science to weather and climate modeling to neuroscience. Here we briefly summarize this AI science opportunity and what the U.S. government can do to seize it.
Opportunities of Artificial Intelligence and Science
The vast majority of scientific research in almost all fields today can be classified as "lone ranger" science.
In other words, scientists and their research teams of a dozen researchers come up with an idea, conduct experiments to test it, write up and publish the results, perhaps share their experimental data on the Internet, and then repeat the process.
Other scientists can consolidate these results by reading published papers, but This process is error-prone and extremely inefficient for several reasons:
(1) It is impossible for individual scientists to read already published papers in their field All articles published are therefore partially blind to other relevant studies; (2) Experiments described in journal publications necessarily omit many details, making it difficult for others to replicate their results and build on the results; (3) A single Analysis of experimental data sets is often performed in isolation, failing to incorporate data from other related experiments conducted by other scientists (and therefore not incorporating valuable information).
In the next ten years, artificial intelligence can help scientists overcome the above three problems
AI can transform this "lone ranger" scientific research model into a "community scientific discovery" model. In particular, AI can be used to create a new type of computer research assistant that helps human scientists overcome these problems by:
Discover complex data sets (including those built from many experiments conducted in multiple laboratories) ) rather than conducting isolated analyzes on a single, much smaller and less representative data set. More comprehensive and accurate analysis can be achieved by basing analysis on data sets that are orders of magnitude larger than human capability. Use artificial intelligence large-scale language models such as GPT to read and digest every relevant publication in the field, thereby helping scientists form new hypotheses not only based on experimental data from their own laboratory and other laboratories, but also based on published Use assumptions and arguments from the research literature to formulate new hypotheses, leading to more informed hypotheses than would have been possible without this natural language AI tool. Create “base models” and train these models using many different types of experimental data collected by labs and scientists, thus bringing the growing knowledge in the field into one place and making it computer-accessible Execution model. These executable "base models" can serve the same purpose as equations such as f = ma, i.e. they make predictions about certain quantities based on other observed quantities. And, unlike classical equations, these underlying models can capture the empirical relationships between hundreds of thousands of different variables rather than just a handful of variables. Automate or semi-automate new experimental design and robotic execution, thereby accelerating new relevant experiments and improving the reproducibility of scientific experiments.
What scientific breakthroughs might this paradigm shift in scientific practice bring?
Here are a few examples:
Reduce the development time and cost of new vaccines for new disease outbreaks by 10x. Accelerating materials research may lead to breakthrough products such as room-temperature superconductors and thermoelectric materials that convert heat into electricity without producing emissions. Combining a never-before-attempted volume and diversity of cell biology experimental data to form a "basic model" of human cell function, enabling the more expensive step of conducting in vivo experiments in the laboratory , quickly simulate the results of many potential experiments. Combined with experimental data from neuroscience (from single neuron behavioral data to whole-brain fMRI imaging), build a "basic model" of the human brain at multiple levels of detail, integrate data with unprecedented scale and diversity, and establish A model that predicts the neural activity the brain uses to encode different types of thoughts and emotions, how those thoughts and emotions are evoked by different stimuli, the effects of drugs on neural activity, and the effectiveness of different treatments for mental disorders. Improve our ability to predict weather, both by tailoring forecasts to highly localized areas (e.g., individual farms) and by expanding our ability to predict future weather.
What can the US government do to seize this opportunity?
Translating this opportunity into reality requires several elements:
Lots of experimental data
One lesson of basic text-based models is that the more data they are trained on, the more powerful they become. Experienced scientists also know very well the value of more and more diverse experimental data. To achieve many orders of magnitude progress in science, and to train the types of underlying models we want, we need to make very significant advances in our ability to share and jointly analyze diverse datasets contributed by the entire scientific community.
The ability to access scientific publications and read them with computers
A key part of the opportunity here is to change the current situation: scientists are unlikely to read 1% of relevant publications in their field, computers read 100% of publications, summarizes them and their relevance to current scientific issues, and provides a conversational interface to discuss their content and implications. This requires not only access to online literature, but also AI research to build such a "literary assistant."
Computing and Network Resources
Text-based basic models such as GPT and Gemini are known for the large amount of processing resources consumed during their development. Developing basic models in different scientific fields also requires large amounts of computing resources. However, the computational demands in many AI scientific efforts are likely to be much smaller than those required to train LLMs such as GPT, and thus can be achieved with investments similar to those being made by government research labs.
For example, AlphaFold, an AI model that has revolutionized protein analysis for drug design, uses far less training computation than basic text-based models like GPT and Gemini. To support data sharing, we need massive computer networks, but the current Internet already provides a sufficient starting point for transferring large experimental data sets. Therefore, the cost of hardware to support AI-driven scientific advancement is likely to be quite low compared to the potential benefits.
New Machine Learning and AI Methods
Current machine learning methods are extremely useful for discovering statistical regularities in huge data sets that humans cannot examine (for example, AlphaFold is performed on large amounts of protein sequences and their carefully measured 3D structures trained). A key part of the new opportunity is to expand current machine learning methods (discovering statistical correlations in data) in two important directions: (1) moving from finding correlations to finding causal relationships in data, and (2) moving from finding only large-scale Structured dataset learning moves toward learning from large structured datasets and large research literatures; that is, learning like human scientists from experimental data and published hypotheses and arguments expressed in natural language by others. The recent emergence of LLMs with advanced capabilities for digesting, summarizing, and reasoning about large text collections could provide the basis for this new class of machine learning algorithms.
What should the government do? The key is to support the above four parts and unite the scientific community to explore new methods based on artificial intelligence to promote their research progress. Therefore, the government should consider taking the following actions:
Explore specific opportunities in specific areas of science, Fund multi-institutional research teams in many scientific areas to present visions and preliminary results that demonstrate how AI can be used to significantly accelerate progress in their fields, and what is needed to scale this approach. This work should not be funded in the form of grants to individual institutions, as the greatest advances may come from integrating data and research from many scientists at many institutions. Instead, it is likely to be most effective if carried out by a team of scientists from many institutions, who propose opportunities and approaches that inspire their engagement with the scientific community at large.
Accelerate the creation of new experimental datasets to train new base models and make data available to the entire community of scientists:
Create data sharing standards to enable one scientist to conveniently use experimental data created by different scientists, and lay the foundation for national data resources in each relevant scientific field. Note that there have been previous successes in developing and using such standards that can provide a starting template for standards efforts (e.g., the success of data sharing during the Human Genome Project).
Create and support data sharing websites for every relevant field. Just as GitHub has become the go-to site for software developers to contribute, share, and reuse software code, creating a GitHub for scientific datasets can serve as both a data repository and a search engine for discovering topics related to specific topics, Hypothesize or plan an experiment on the most relevant data set.
Study how to build incentive mechanisms to maximize data sharing. Currently, scientific fields vary widely in the extent to which individual scientists share their data and the extent to which for-profit organizations use their data for basic scientific research. Building a large, shareable national data resource is integral to the scientific opportunity for AI, and building a compelling incentive structure for data sharing will be key to success.
Where appropriate, fund the development of automated laboratories (e.g. robotic laboratories for chemistry, biology, etc. experiments that can be used by many scientists via the Internet) to conduct experiments efficiently and generate them in a standard format data. A major benefit of creating such laboratories is that they will also promote the development of standards that precisely specify the experimental procedures to be followed, thereby increasing the reproducibility of experimental results. Just as we can benefit from GitHubs for datasets, we can also benefit from related GitHubs to share, modify, and reuse components of experimental protocols.
To create a new generation of artificial intelligence tools requires:
Funding relevant basic AI research specifically developed for scientific research methods. This should include the development of "foundational models" in a broad sense as tools to accelerate research in different fields and accelerate the shift from "lone ranger" science to a more powerful "community scientific discovery" paradigm.
Specially supports research by reading the research literature, critiquing stated input assumptions and suggesting improvements, and helping scientists derive results from the scientific literature in a way that is directly relevant to their current questions.
Specially supports research that extends machine learning from the discovery of correlations to the discovery of causation, especially in settings where new experiments can be planned and executed to test causal hypotheses.
Specially supports the expansion of research on machine learning algorithms, from only taking big data as input, to taking both large experimental data and complete research literature in the field as input, in order to generate statistical regularities in experimental data and research literature The assumptions, explanations, and arguments discussed in .
Related content:
The above is the detailed content of 'Father of Machine Learning' Mitchell writes: How AI accelerates scientific development and how the United States seizes opportunities. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics



Binance is the overlord of the global digital asset trading ecosystem, and its characteristics include: 1. The average daily trading volume exceeds $150 billion, supports 500 trading pairs, covering 98% of mainstream currencies; 2. The innovation matrix covers the derivatives market, Web3 layout and education system; 3. The technical advantages are millisecond matching engines, with peak processing volumes of 1.4 million transactions per second; 4. Compliance progress holds 15-country licenses and establishes compliant entities in Europe and the United States.

Exchanges play a vital role in today's cryptocurrency market. They are not only platforms for investors to trade, but also important sources of market liquidity and price discovery. The world's largest virtual currency exchanges rank among the top ten, and these exchanges are not only far ahead in trading volume, but also have their own advantages in user experience, security and innovative services. Exchanges that top the list usually have a large user base and extensive market influence, and their trading volume and asset types are often difficult to reach by other exchanges.

The plunge in the cryptocurrency market has caused panic among investors, and Dogecoin (Doge) has become one of the hardest hit areas. Its price fell sharply, and the total value lock-in of decentralized finance (DeFi) (TVL) also saw a significant decline. The selling wave of "Black Monday" swept the cryptocurrency market, and Dogecoin was the first to be hit. Its DeFiTVL fell to 2023 levels, and the currency price fell 23.78% in the past month. Dogecoin's DeFiTVL fell to a low of $2.72 million, mainly due to a 26.37% decline in the SOSO value index. Other major DeFi platforms, such as the boring Dao and Thorchain, TVL also dropped by 24.04% and 20, respectively.

Exchanges that support cross-chain transactions: 1. Binance, 2. Uniswap, 3. SushiSwap, 4. Curve Finance, 5. Thorchain, 6. 1inch Exchange, 7. DLN Trade, these platforms support multi-chain asset transactions through various technologies.

After ETH upgrade, novices should adopt the following strategies to avoid losses: 1. Do their homework and understand the basic knowledge and upgrade content of ETH; 2. Control positions, test the waters in small amounts and diversify investment; 3. Make a trading plan, clarify goals and set stop loss points; 4. Profil rationally and avoid emotional decision-making; 5. Choose a formal and reliable trading platform; 6. Consider long-term holding to avoid the impact of short-term fluctuations.

In the volatile cryptocurrency market, investors are looking for alternatives that go beyond popular currencies. Although well-known cryptocurrencies such as Solana (SOL), Cardano (ADA), XRP and Dogecoin (DOGE) also face challenges such as market sentiment, regulatory uncertainty and scalability. However, a new emerging project, RexasFinance (RXS), is emerging. It does not rely on celebrity effects or hype, but focuses on combining real-world assets (RWA) with blockchain technology to provide investors with an innovative way to invest. This strategy makes it hoped to be one of the most successful projects of 2025. RexasFi

WorldCoin (WLD) stands out in the cryptocurrency market with its unique biometric verification and privacy protection mechanisms, attracting the attention of many investors. WLD has performed outstandingly among altcoins with its innovative technologies, especially in combination with OpenAI artificial intelligence technology. But how will the digital assets behave in the next few years? Let's predict the future price of WLD together. The 2025 WLD price forecast is expected to achieve significant growth in WLD in 2025. Market analysis shows that the average WLD price may reach $1.31, with a maximum of $1.36. However, in a bear market, the price may fall to around $0.55. This growth expectation is mainly due to WorldCoin2.

The platforms that have outstanding performance in leveraged trading, security and user experience in 2025 are: 1. OKX, suitable for high-frequency traders, providing up to 100 times leverage; 2. Binance, suitable for multi-currency traders around the world, providing 125 times high leverage; 3. Gate.io, suitable for professional derivatives players, providing 100 times leverage; 4. Bitget, suitable for novices and social traders, providing up to 100 times leverage; 5. Kraken, suitable for steady investors, providing 5 times leverage; 6. Bybit, suitable for altcoin explorers, providing 20 times leverage; 7. KuCoin, suitable for low-cost traders, providing 10 times leverage; 8. Bitfinex, suitable for senior play
