The AI Scientist-v2: Workshop-Level Automated
Scientific Discovery via Agentic Tree Search

📚 [Paper] | 📝 [Blog Post] | 📂 [ICLR2025 Workshop Experiment]

Fully autonomous scientific research systems are becoming increasingly capable, with AI playing a pivotal role in transforming how scientific discoveries are made. We are excited to introduce The AI Scientist-v2, a generalized end-to-end agentic system that has generated the first workshop paper written entirely by AI and accepted through peer review.

This system autonomously generates hypotheses, runs experiments, analyzes data, and writes scientific manuscripts. Unlike its predecessor (AI Scientist-v1), the AI Scientist-v2 removes reliance on human-authored templates, generalizes across Machine Learning (ML) domains, and employs a progressive agentic tree search, guided by an experiment manager agent.

Note: The AI Scientist-v2 doesn’t necessarily produce better papers than v1, especially when a strong starting template is available. v1 follows well-defined templates, leading to high success rates, while v2 takes a broader, more exploratory approach with lower success rates. v1 works best for tasks with clear objectives and a solid foundation, whereas v2 is designed for open-ended scientific exploration.

Caution! This codebase will execute Large Language Model (LLM)-written code. There are various risks and challenges associated with this autonomy, including the potential use of dangerous packages, uncontrolled web access, and the possibility of spawning unintended processes. Ensure that you run this within a controlled sandbox environment (e.g., a Docker container). Use at your own discretion.

Requirements

This code is designed to run on Linux with NVIDIA GPUs using CUDA and PyTorch.

Installation

# Create a new conda environment
conda create -n ai_scientist python=3.11
conda activate ai_scientist

# Install PyTorch with CUDA support (adjust pytorch-cuda version for your setup)
conda install pytorch torchvision torchaudio pytorch-cuda=12.4 -c pytorch -c nvidia

# Install PDF and LaTeX tools
conda install anaconda::poppler
conda install conda-forge::chktex

# Install Python package requirements
pip install -r requirements.txt

Supported Models and API Keys

OpenAI Models

By default, the system uses the OPENAI_API_KEY environment variable for OpenAI models.

Gemini Models

By default, the system uses the GEMINI_API_KEY environment variable for Gemini models through OpenAI API.

Claude Models via AWS Bedrock

To use Claude models provided by Amazon Bedrock, install the necessary additional packages:

pip install anthropic[bedrock]

Next, configure valid AWS Credentials and the target AWS Region by setting the following environment variables: AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_REGION_NAME.

Semantic Scholar API (Literature Search)

Our code can optionally use a Semantic Scholar API Key (S2_API_KEY) for higher throughput during literature search if you have one. This is used during both the ideation and paper writing stages. The system should work without it, though you might encounter rate limits or reduced novelty checking during ideation. If you experience issues with Semantic Scholar, you can skip the citation phase during paper generation.

Setting API Keys

Ensure you provide the necessary API keys as environment variables for the models you intend to use. For example:

export OPENAI_API_KEY="YOUR_OPENAI_KEY_HERE"
export S2_API_KEY="YOUR_S2_KEY_HERE"
# Set AWS credentials if using Bedrock
# export AWS_ACCESS_KEY_ID="YOUR_AWS_ACCESS_KEY_ID"
# export AWS_SECRET_ACCESS_KEY="YOUR_AWS_SECRET_KEY"
# export AWS_REGION_NAME="your-aws-region"

Generate Research Ideas

Before running the full AI Scientist-v2 experiment pipeline, you first use the ai_scientist/perform_ideation_temp_free.py script to generate potential research ideas. This script uses an LLM to brainstorm and refine ideas based on a high-level topic description you provide, interacting with tools like Semantic Scholar to check for novelty.

Prepare a Topic Description: Create a Markdown file (e.g., my_research_topic.md) describing the research area or theme you want the AI to explore. This file should contain sections like Title, Keywords, TL;DR, and Abstract to define the scope of the research. Refer to the example file ai_scientist/ideas/i_cant_believe_its_not_better.md for the expected structure and content format. Place your file in a location accessible by the script (e.g., the ai_scientist/ideas/ directory).
Run the Ideation Script: Execute the script from the main project directory, pointing it to your topic description file and specifying the desired LLM.
```
python ai_scientist/perform_ideation_temp_free.py \
 --workshop-file "ai_scientist/ideas/my_research_topic.md" \
 --model gpt-4o-2024-05-13 \
 --max-num-generations 20 \
 --num-reflections 5
```
- --workshop-file: Path to your topic description Markdown file.
- --model: The LLM to use for generating ideas (ensure you have the corresponding API key set).
- --max-num-generations: How many distinct research ideas to attempt generating.
- --num-reflections: How many refinement steps the LLM should perform for each idea.
Output: The script will generate a JSON file named after your input Markdown file (e.g., ai_scientist/ideas/my_research_topic.json). This file will contain a list of structured research ideas, including hypotheses, proposed experiments, and related work analysis.
Proceed to Experiments: Once you have the generated JSON file containing research ideas, you can proceed to the next section to run the experiments.

This ideation step guides the AI Scientist towards specific areas of interest and produces concrete research directions to be tested in the main experimental pipeline.

Run AI Scientist-v2 Paper Generation Experiments

Using the JSON file generated in the previous ideation step, you can now launch the main AI Scientist-v2 pipeline. This involves running experiments via agentic tree search, analyzing results, and generating a paper draft.

Specify the models used for the write-up and review phases via command-line arguments. The configuration for the best-first tree search (BFTS) is located in bfts_config.yaml. Adjust parameters in this file as needed.

Key tree search configuration parameters in bfts_config.yaml:

agent config:
- Set num_workers (number of parallel exploration paths) and steps (maximum number of nodes to explore). For example, if num_workers=3 and steps=21, the tree search will explore up to 21 nodes, expanding 3 nodes concurrently at each step.
- num_seeds: Should generally be the same as num_workers if num_workers is less than 3. Otherwise, set num_seeds to 3.
- Note: Other agent parameters like k_fold_validation, expose_prediction, and data_preview are not used in the current version.
search config:
- max_debug_depth: The maximum number of times the agent will attempt to debug a failing node before abandoning that search path.
- debug_prob: The probability of attempting to debug a failing node.
- num_drafts: The number of initial root nodes (i.e., the number of independent trees to grow) during Stage 1.

Example command to run AI-Scientist-v2 using a generated idea file (e.g., my_research_topic.json). Please review bfts_config.yaml for detailed tree search parameters (the default config includes claude-3-5-sonnet for experiments). Do not set load_code if you do not want to initialize experimentation with a code snippet.

python launch_scientist_bfts.py \
 --load_ideas "ai_scientist/ideas/my_research_topic.json" \
 --load_code \
 --add_dataset_ref \
 --model_writeup o1-preview-2024-09-12 \
 --model_citation gpt-4o-2024-11-20 \
 --model_review gpt-4o-2024-11-20 \
 --model_agg_plots o3-mini-2025-01-31 \
 --num_cite_rounds 20

Once the initial experimental stage is complete, you will find a timestamped log folder inside the experiments/ directory. Navigate to experiments/"timestamp_ideaname"/logs/0-run/ within that folder to find the tree visualization file unified_tree_viz.html.

Citing The AI Scientist-v2

If you use The AI Scientist-v2 in your research, please cite our work as follows:

@article{aiscientist_v2,
  title={The AI Scientist-v2: Workshop-Level Automated Scientific Discovery via Agentic Tree Search},
  author={Yamada, Yutaro and Lange, Robert Tjarko and Lu, Cong and Hu, Shengran and Lu, Chris and Foerster, Jakob and Clune, Jeff and Ha, David},
  journal={arXiv preprint arXiv:2504.08066},
  year={2025}
}

Frequently Asked Questions

Why wasn't a PDF or a review generated for my experiment?

The AI Scientist-v2 completes experiments with a success rate that depends on the chosen foundation model, and the complexity of the idea. Higher success rates are generally observed when using powerful models like Claude 3.5 Sonnet for the experimentation phase.

What is the estimated cost per experiment?

The ideation step cost depends on the LLM used and the number of generations/reflections, but is generally low (a few dollars). For the main experiment pipeline, using Claude 3.5 Sonnet for the experimentation phase typically costs around $15–$20 per run. The subsequent writing phase adds approximately $5 when using the default models specified in the example command. Using GPT-4o for model_citation is recommended as it can help reduce writing costs.

How do I run The AI Scientist-v2 for different subject fields?

First, perform the Generate Research Ideas step. Create a new Markdown file describing your desired subject field or topic, following the structure of the example ai_scientist/ideas/i_cant_believe_its_not_better.md. Run the perform_ideation_temp_free.py script with this file to generate a corresponding JSON idea file. Then, proceed to the Run AI Scientist-v2 Paper Generation Experiments step, using this JSON file with the launch_scientist_bfts.py script via the --load_ideas argument.

What should I do if I have problems accessing the Semantic Scholar API?

The Semantic Scholar API is used to assess the novelty of generated ideas and to gather citations during the paper write-up phase. If you don't have an API key, encounter rate limits, you may be able to skip these phases.

I encountered a "CUDA Out of Memory" error. What can I do?

This error typically occurs when the AI Scientist-v2 attempts to load or run a model that requires more GPU memory than available on your system. To resolve this, you can try updating your ideation prompt file (ai_scientist/ideas/my_research_topic.md) to suggest using smaller models for the experiments.

Acknowledgement

The tree search component implemented within the ai_scientist directory is built on top of the AIDE project. We thank the AIDE developers for their valuable contributions and for making their work publicly available.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
ai_scientist		ai_scientist
docs		docs
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
bfts_config.yaml		bfts_config.yaml
launch_scientist_bfts.py		launch_scientist_bfts.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

The AI Scientist-v2: Workshop-Level Automated
Scientific Discovery via Agentic Tree Search

Table of Contents

Requirements

Installation

Supported Models and API Keys

OpenAI Models

Gemini Models

Claude Models via AWS Bedrock

Semantic Scholar API (Literature Search)

Setting API Keys

Generate Research Ideas

Run AI Scientist-v2 Paper Generation Experiments

Citing The AI Scientist-v2

Frequently Asked Questions

Acknowledgement

Star History

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 7

Languages

License

SakanaAI/AI-Scientist-v2

Folders and files

Latest commit

History

Repository files navigation

The AI Scientist-v2: Workshop-Level Automated Scientific Discovery via Agentic Tree Search

Table of Contents

Requirements

Installation

Supported Models and API Keys

OpenAI Models

Gemini Models

Claude Models via AWS Bedrock

Semantic Scholar API (Literature Search)

Setting API Keys

Generate Research Ideas

Run AI Scientist-v2 Paper Generation Experiments

Citing The AI Scientist-v2

Frequently Asked Questions

Acknowledgement

Star History

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 7

Languages

The AI Scientist-v2: Workshop-Level Automated
Scientific Discovery via Agentic Tree Search

Packages