


The first fully automated scientific discovery AI system, Transformer author startup Sakana AI launches AI Scientist
Editor | ScienceAI
A year ago, Llion Jones, the last author of Google’s Transformer paper, left to start a business and co-founded the artificial intelligence company Sakana AI with former Google researcher David Ha. Sakana AI claims to create a new foundational model based on nature-inspired intelligence!
Now, Sakana AI has handed in its answer sheet.
Sakana AI announces the launch of AI Scientist, the world’s first AI system for automated scientific research and open discovery!
From conceiving, writing code, running experiments and summarizing results, to writing entire papers and conducting peer reviews, AI Scientists usher in a new era of AI-driven scientific research and accelerated discovery.
In principle, it can continuously repeat the scientific research process, iteratively developing ideas in an open manner, just like human scientists.
The researchers demonstrated its versatility by applying it to three different subfields of machine learning: diffusion modeling, Transformer-based language modeling, and learning dynamics.
Each idea will be implemented and developed into a complete paper for less than $15 per paper. To evaluate the generated papers, the researchers designed and validated an automated reviewer with near-human performance in assessing paper scores.
AI Scientist can write papers that exceed the acceptance threshold of top machine learning conferences.
The launch of AI Scientist marks an important step towards realizing the full potential of artificial intelligence in scientific research. By automating the discovery process and integrating AI-driven review systems, it opens the door to endless possibilities for innovation and problem-solving in the most challenging fields of science and technology.
Relevant research titled "The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery" was published on the preprint platform arXiv on August 12.
Paper link: https://arxiv.org/abs/2408.06292
One of the challenges facing artificial intelligence is to develop agents that can conduct scientific research and discover new knowledge. While cutting-edge models have been used as ancillary tools for human scientists, such as brainstorming ideas, writing code, or performing predictive tasks, they still only complete a small part of the scientific process.
In latest research, scientists at Sakana AI propose the first comprehensive framework for fully automated scientific discovery, enabling cutting-edge large-scale language models to independently conduct research and communicate their findings.
AI Scientist can generate novel research ideas, write code, perform experiments, visualize results, describe their findings by writing a full scientific paper, and then run a simulated review process for evaluation.
About AI Scientist
AI Scientist has three main stages: (1) idea generation, (2) experimental iteration, (3) paper writing. Once written, the researchers introduce and validate the reviews generated by the LLM to assess the quality of the resulting paper.
Illustration: End-to-end LLM-driven scientific discovery process AI Scientist’s concept illustration. (Source: Paper)
Researchers provide AI Scientists with a starting code template that reproduces lightweight baseline training runs of popular models or benchmarks. For example, this might be code to train a small transformer on Shakespeare, a classic proof-of-concept training run in natural language processing that can be completed in minutes.
Then, AI Scientists are free to explore any possible research direction. The template also includes a LaTeX folder containing style files and section headers as well as simple plotting code. Generally, each run starts with a representative small-scale experiment relevant to the topic area.
The researchers explained: "Focusing on small-scale experiments is not a fundamental limitation of our method, but is simply a matter of computational efficiency and the computational limitations of our equipment."
Why is writing a paper important?
Given that the overall goal of scientists is to automate scientific discovery, why would researchers want AI Scientists to write papers like human scientists? For example, previous AI systems such as FunSearch and GNoME once produced impressive scientific discoveries in restricted fields, but they were not capable of writing papers.
The team believes that it is crucial for AI Scientists to write scientific papers to disseminate their findings for the following reasons: first, writing papers provides humans with a highly interpretable way to benefit from what they have learned; second, in Reviewing written papers within the framework of existing machine learning conferences allows scientists to standardize assessments; third, since the birth of modern science, scientific papers have been the main medium for disseminating research results.
Because the paper can use natural language and contain plots and codes, it can flexibly describe any type of scientific research and findings. Almost every other format imaginable is locked into some data or scientific genre. Until a superior alternative emerges (or may be invented by artificial intelligence), the team believes training AI Scientists to write scientific papers is critical to their integration into the wider scientific community.
Illustration: Preview of the "Adaptive Dual-Scale Denoising" paper completely independently generated by AI Scientist. (Source: paper)
About the cost
The framework here is flexible enough to efficiently conduct research in various subfields of machine learning, including transformer-based language modeling, neural network learning dynamics, and diffusion modeling. The system is highly cost-effective, costing approximately $15 per paper, and produces conference-relevant papers, highlighting its ability to democratize research (increase its accessibility) and accelerate scientific progress.
For example, the researchers’ preliminary qualitative analysis of AI Scientist suggests that the resulting papers can be broadly informative and novel, or at least contain ideas worthy of future research.
The actual amount of computing allocated by the team to AI Scientists for experiments is also very small by current standards. Notably, most of the researchers' experiments, which generated hundreds of papers in a week, were run using only a single 8×NVIDIA H100 node. If the search and filtering scope were expanded on a large scale, higher quality papers might be produced.
In this project, most of the cost of running AI Scientist was related to the cost of LLM API coding and paper writing. In comparison, the costs associated with running the LLM reviewer and the computational expense of conducting the experiments were negligible due to constraints imposed by the team to reduce overall costs.
Of course, this cost sharing may change in the future if AI Scientists are applied to other scientific fields or used in larger-scale computational experiments.
Open vs. Closed Model
To quantitatively evaluate and optimize the generated papers, the researchers first created and validated an automated paper reviewer. The results show that, although there is still a lot of room for optimization, LLM is able to produce fairly accurate reviews and achieve results comparable to humans on various metrics.
Graphic: Violin graph shows the distribution of scores for AI Scientist reviewer-generated papers in three areas and four base models. (Source: Paper)
Applying this reviewer to papers generated by AI Scientist enables scientists to extend paper evaluation beyond human review. The researchers found that Sonnet 3.5 consistently produced the best papers, some of which even exceeded the acceptance threshold of automated paper reviewers at standard machine learning conferences.
However, the team has no reason to expect AI Scientist to maintain its lead with a single model like Sonnet 3.5. Researchers believe that all cutting-edge LLMs, including open models, will continue to improve. Competition among LLMs will significantly increase their commoditization and capabilities.
Illustration: Evaluating AI Scientist’s paper review process on ICLR 2022 OpenReview data using GPT-4o. (Source: Paper)
In this project, the researchers studied a variety of proprietary LLMs, including GPT-4o and Sonnet, but also explored the use of open models such as DeepSeek and Llama-3. The open model was found to have significant advantages, such as lower costs, guaranteed availability, greater transparency, and greater flexibility, albeit with slightly lower quality.
In the future, the researchers aim to use the proposed discovery process to produce self-improving artificial intelligence in closed-loop systems using open models.
Future Directions
Immediate improvements to AI Scientist may include integrating visual capabilities to better handle charts and graphs, incorporating human feedback and interaction to improve the output of AI, and enabling AI Scientist to extract data from the Internet new data and models to automatically expand the scope of their experiments, provided it is safe to do so.
Additionally, AI Scientists can follow up on their best ideas and even work directly on their own code in a self-referential way. In fact, most of the code for the project was written by Aider. Expanding the framework to other scientific fields could further expand its impact, paving the way for a new era of automated scientific discovery.
Crucially, future work should address reliability and hallucination issues, possibly through deeper automated validation of reported results. This can be achieved by directly linking the code and experiments, or by seeing if an automated verifier can independently reproduce the results.
Epilogue
AI Scientist marks the beginning of a new era of scientific discovery in machine learning: bringing the transformative advantages of AI agents into the entire research process of AI itself, and bringing scientists closer to a world that can unleash unlimited and affordable A world where creativity and innovation come to solve the world's most challenging problems.
Ultimately, “We envision a scientific ecosystem entirely powered by AI, including not just AI-driven researchers but also reviewers, area chairs, and entire conferences. However, we do not believe that the role of human scientists will weaken. As we adapt to new technologies and move up the food chain, the role of scientists will change," the researchers said in the paper.
While current iterations of AI Scientist demonstrate a strong ability to innovate on top of proven ideas such as diffusion modeling or Transformers, it remains an open question whether such systems will ultimately be able to come up with truly paradigm-shifting ideas.
Will future versions of AI Scientists be able to come up with ideas as impactful as diffusion modeling, or come up with the next Transformer architecture? Will machines eventually be able to invent concepts as fundamental as artificial neural networks or information theory?
"We believe that AI Scientist will be an excellent partner for human scientists, but only time will tell."
GitHub open source address: http://github.com/SakanaAI/AI-Scientist
Paper link: https://arxiv.org/abs/2408.06292
Reference content:
http://sakana.ai/ai-scientist/
https://x.com/SakanaAILabs/status/1823178623513239992
https://mp.weixin.qq.com/s/-jjXBJAkdMEyl2JhRgwdaA
The above is the detailed content of The first fully automated scientific discovery AI system, Transformer author startup Sakana AI launches AI Scientist. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics



This site reported on June 27 that Jianying is a video editing software developed by FaceMeng Technology, a subsidiary of ByteDance. It relies on the Douyin platform and basically produces short video content for users of the platform. It is compatible with iOS, Android, and Windows. , MacOS and other operating systems. Jianying officially announced the upgrade of its membership system and launched a new SVIP, which includes a variety of AI black technologies, such as intelligent translation, intelligent highlighting, intelligent packaging, digital human synthesis, etc. In terms of price, the monthly fee for clipping SVIP is 79 yuan, the annual fee is 599 yuan (note on this site: equivalent to 49.9 yuan per month), the continuous monthly subscription is 59 yuan per month, and the continuous annual subscription is 499 yuan per year (equivalent to 41.6 yuan per month) . In addition, the cut official also stated that in order to improve the user experience, those who have subscribed to the original VIP

In modern manufacturing, accurate defect detection is not only the key to ensuring product quality, but also the core of improving production efficiency. However, existing defect detection datasets often lack the accuracy and semantic richness required for practical applications, resulting in models unable to identify specific defect categories or locations. In order to solve this problem, a top research team composed of Hong Kong University of Science and Technology Guangzhou and Simou Technology innovatively developed the "DefectSpectrum" data set, which provides detailed and semantically rich large-scale annotation of industrial defects. As shown in Table 1, compared with other industrial data sets, the "DefectSpectrum" data set provides the most defect annotations (5438 defect samples) and the most detailed defect classification (125 defect categories

The open LLM community is an era when a hundred flowers bloom and compete. You can see Llama-3-70B-Instruct, QWen2-72B-Instruct, Nemotron-4-340B-Instruct, Mixtral-8x22BInstruct-v0.1 and many other excellent performers. Model. However, compared with proprietary large models represented by GPT-4-Turbo, open models still have significant gaps in many fields. In addition to general models, some open models that specialize in key areas have been developed, such as DeepSeek-Coder-V2 for programming and mathematics, and InternVL for visual-language tasks.

For AI, Mathematical Olympiad is no longer a problem. On Thursday, Google DeepMind's artificial intelligence completed a feat: using AI to solve the real question of this year's International Mathematical Olympiad IMO, and it was just one step away from winning the gold medal. The IMO competition that just ended last week had six questions involving algebra, combinatorics, geometry and number theory. The hybrid AI system proposed by Google got four questions right and scored 28 points, reaching the silver medal level. Earlier this month, UCLA tenured professor Terence Tao had just promoted the AI Mathematical Olympiad (AIMO Progress Award) with a million-dollar prize. Unexpectedly, the level of AI problem solving had improved to this level before July. Do the questions simultaneously on IMO. The most difficult thing to do correctly is IMO, which has the longest history, the largest scale, and the most negative

Editor | ScienceAI Based on limited clinical data, hundreds of medical algorithms have been approved. Scientists are debating who should test the tools and how best to do so. Devin Singh witnessed a pediatric patient in the emergency room suffer cardiac arrest while waiting for treatment for a long time, which prompted him to explore the application of AI to shorten wait times. Using triage data from SickKids emergency rooms, Singh and colleagues built a series of AI models that provide potential diagnoses and recommend tests. One study showed that these models can speed up doctor visits by 22.3%, speeding up the processing of results by nearly 3 hours per patient requiring a medical test. However, the success of artificial intelligence algorithms in research only verifies this

Editor |KX To this day, the structural detail and precision determined by crystallography, from simple metals to large membrane proteins, are unmatched by any other method. However, the biggest challenge, the so-called phase problem, remains retrieving phase information from experimentally determined amplitudes. Researchers at the University of Copenhagen in Denmark have developed a deep learning method called PhAI to solve crystal phase problems. A deep learning neural network trained using millions of artificial crystal structures and their corresponding synthetic diffraction data can generate accurate electron density maps. The study shows that this deep learning-based ab initio structural solution method can solve the phase problem at a resolution of only 2 Angstroms, which is equivalent to only 10% to 20% of the data available at atomic resolution, while traditional ab initio Calculation

Editor |ScienceAI Question Answering (QA) data set plays a vital role in promoting natural language processing (NLP) research. High-quality QA data sets can not only be used to fine-tune models, but also effectively evaluate the capabilities of large language models (LLM), especially the ability to understand and reason about scientific knowledge. Although there are currently many scientific QA data sets covering medicine, chemistry, biology and other fields, these data sets still have some shortcomings. First, the data form is relatively simple, most of which are multiple-choice questions. They are easy to evaluate, but limit the model's answer selection range and cannot fully test the model's ability to answer scientific questions. In contrast, open-ended Q&A

In 2023, almost every field of AI is evolving at an unprecedented speed. At the same time, AI is constantly pushing the technological boundaries of key tracks such as embodied intelligence and autonomous driving. Under the multi-modal trend, will the situation of Transformer as the mainstream architecture of AI large models be shaken? Why has exploring large models based on MoE (Mixed of Experts) architecture become a new trend in the industry? Can Large Vision Models (LVM) become a new breakthrough in general vision? ...From the 2023 PRO member newsletter of this site released in the past six months, we have selected 10 special interpretations that provide in-depth analysis of technological trends and industrial changes in the above fields to help you achieve your goals in the new year. be prepared. This interpretation comes from Week50 2023
