Meet The AI Scientist About to Change Research Forever

Computer scientists have automated the entire scientific process using an AI machine that develops hypotheses, tests them in appropriate experiments and writes up the results in a scientific paper.

The Physics arXiv Blog

By The Physics arXiv Blog

Sep 13, 2024 8:00 AMSep 13, 2024 4:12 PM

(Credit: Stock-Asso/Shutterstock)

Newsletter

Sign up for our email newsletter for the latest science news

Large language models like ChatGPT are revolutionizing the way people write. But that creates a problem for scientists. These models are trained on human knowledge that already exists, whereas science is generally concerned with new findings that extend this body of knowledge.

So scientific papers can contain information that an LLM will never have seen. That means asking one of these machines to write a scientific paper raises important questions about whether it can write accurate statements on a topic on which it has no training.

That's probably why various analyses show that scientists have been using LLMs to edit their papers but not to write them. Now that looks set to change with the unveiling of an "AI Scientist" that performs the entire scientific process, including the write up. Chris Lu, Robert Lange and David Ha at Sakana AI and colleagues have created a machine that develops and tests hypotheses, designs and executes experiments, gathers and interprets data and finally writes this all up in a scientific paper.

The AI scientist even evaluates the paper to determine its suitability for publication. "We introduce the first end-to-end framework for fully automated scientific discovery," say the team.

Their work has profound implications for the way scientists go about their work, about the nature of science itself and how society should think about and exploit it.

Science Automation

Lu and co begin by dividing the scientific process into a series of tasks that are each manageable by a sufficiently well prompted LLM. In addition, they confine the area of research to machine learning so that the work can be done within an area of science that is largely accessible to a machine.

But in principle, they say, there is no reason why the AI Scientist cannot apply its trade to physics, biology, chemistry or any sub-discipline of science, provided it has the agency to experiment in those areas. They go on to test this using several publicly available LLMS, including Claude Sonnet 3.5, ChatGPT-4o, DeepSeek Coder and Llama-3.1 405b.

Lu and co say the AI Scientist works in three main phases with the first being to generate an idea worth exploring based on an archive of previous research. The team then ask the model to refine the idea using chain-of-thought reasoning and self-reflection, two mechanisms that have recently helped to improve the output of large language models using deductive reasoning. For each idea, the system also produces a plan to test it.

The model then determines the novelty of the approach by comparing it against those already in its database. "This allows The AI Scientist to discard any idea that is too similar to existing literature," say Lu and co.

Having found a sufficiently novel idea, the AI Scientist moves on to the next phase which is to perform the experiment and gather data. Because the area of science is machine learning, the experiments take place entirely in silico. So the system writes the code for the set of proposed experiments and then performs them in in order, while correcting any coding errors that crop up.

This process produces a set of results. The AI Scientist then uses this data to produce notes in the style of an experimental journal and plots various figures with detailed descriptions of what they show.

The final stage is to write up the experiment "in the style of a standard machine learning conference proceeding". For this, it uses a blank paper template pre-divided into a standard format: introduction, background, methods, experimental setup, results and conclusion. The AI edits each section once using the process of self-reflection before searching the web for relevant references, which it then adds.

The team say the resultant paper can often be overly verbose and repetitive and so needs another round of editing. "To resolve this, we perform one final round of self-reflection section-by-section, aiming to remove any duplicated information and streamline the arguments of the paper," the say.

The process ends with the AI Scientist reviewing its own work based on a database of human reviews of papers submitted to the 2022 International Conference on Learning Representations. The goal is to give the paper a score that matches the assessment a human reviewer might give.

In this way, the team's AI Scientist generated hundreds of papers at a cost of around $15 each, significantly less than the $100,000 a human paper is thought to cost in terms of salaries etc. "We find that Claude Sonnet 3.5 consistently produces the highest quality papers, with GPT-4o coming in second," they say.

But the papers are by no means perfect, with Lu and co describing them as "medium quality". "Overall, we judge the performance of The AI Scientist to be about the level of an early-stage machine learning researcher who can competently execute an idea but may not have the full background knowledge to fully interpret the reasons behind an algorithm’s success," they say.

Superhuman Performance?

In other words, the AI Scientist doesn't always appreciate the significance of what it has done.

The team say that a human supervisor would probably advise such an early stage researcher to go back to the lab and plan a further set of experiments that will help tease apart and answer the questions the work generates.

But these problems do not seem to be showstoppers. "We naturally expect that many of the flaws of the AI Scientist will improve, if not be eliminated, as foundation models continue to improve dramatically," say Lu and co.

That's interesting work raising profound questions for science and scientists themselves. Not least of these is what will happen when the AI Scientist begins to outperform humans. "Future generations of foundation models may propose ideas that are challenging for humans to reason about and evaluate," point out the researchers, adding that the challenge of supervising AI systems that are smarter than humans is becoming an active area of research, among humans anyway.

Then there is the question of how humans should access and exploit any future stream of AI generated scientific research. It's not hard to imagine humans quickly becoming overwhelmed by this volume, as well as incapable of reasoning sufficiently deeply about it.

These are important questions for scientists and for broader society. The future of science is at stake.

Ref: The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery : arxiv.org/abs/2408.06292

artificial intelligence

1 free article left

Want More? Get unlimited access for as low as $1.99/month

Subscribe

Already a subscriber?

Register or Log In

1 free articleSubscribe

Want more?

Keep reading for as low as $1.99!

Subscribe

Already a subscriber?

Register or Log In