@ai_suxiaole: 现在很多 AI 工具都能帮科研人员省时间 但大多数只解决一个环节: 读文献、写代码、润色论文、整理摘要。 Sakana AI 的 AI Scientist-v2 它想做的是一个能跑完整科研流程的 AI 系统 从生成研究假设开始,到设计实验…

X AI KOLs Timeline 模型

摘要

Sakana AI 发布了 AI Scientist-v2,一个端到端的自动化科研系统,能够从生成研究假设到撰写论文自动完成,并已通过同行评审被 ICLR2025 Workshop 接收。

现在很多 AI 工具都能帮科研人员省时间 但大多数只解决一个环节: 读文献、写代码、润色论文、整理摘要。 Sakana AI 的 AI Scientist-v2 它想做的是一个能跑完整科研流程的 AI 系统 从生成研究假设开始,到设计实验、执行代码、分析结果,再到写出论文草稿,都可以自动串起来 它还会主动检索相关文献,用来判断研究点是否有新颖性 支持 OpenAI、Claude、Gemini 等主流模型 当然,这不是“AI 科学家已经成熟”的意思 更准确地说,它是一个把科研流程自动化串起来的实验框架 如果你对 AI 自动化科研、Agent 做实验、AI for Science 感兴趣,可以关注一下 GitHub: http://github.com/SakanaAI/AI-Scientist-v2…
查看原文
查看缓存全文

缓存时间: 2026/07/02 16:24

现在很多 AI 工具都能帮科研人员省时间

但大多数只解决一个环节:

读文献、写代码、润色论文、整理摘要。

Sakana AI 的 AI Scientist-v2

它想做的是一个能跑完整科研流程的 AI 系统

从生成研究假设开始,到设计实验、执行代码、分析结果,再到写出论文草稿,都可以自动串起来

它还会主动检索相关文献,用来判断研究点是否有新颖性

支持 OpenAI、Claude、Gemini 等主流模型

当然,这不是“AI 科学家已经成熟”的意思

更准确地说,它是一个把科研流程自动化串起来的实验框架

如果你对 AI 自动化科研、Agent 做实验、AI for Science 感兴趣,可以关注一下

GitHub: http://github.com/SakanaAI/AI-Scientist-v2…


SakanaAI/AI-Scientist-v2

Source: https://github.com/SakanaAI/AI-Scientist-v2

AI Scientist v2 Logo

The AI Scientist-v2: Workshop-Level Automated
Scientific Discovery via Agentic Tree Search

📚 [Paper] | 📝 [Blog Post] | 📂 [ICLR2025 Workshop Experiment]

Fully autonomous scientific research systems are becoming increasingly capable, with AI playing a pivotal role in transforming how scientific discoveries are made. We are excited to introduce The AI Scientist-v2, a generalized end-to-end agentic system that has generated the first workshop paper written entirely by AI and accepted through peer review.

This system autonomously generates hypotheses, runs experiments, analyzes data, and writes scientific manuscripts. Unlike its predecessor (AI Scientist-v1), the AI Scientist-v2 removes reliance on human-authored templates, generalizes across Machine Learning (ML) domains, and employs a progressive agentic tree search, guided by an experiment manager agent.

Note: The AI Scientist-v2 doesn’t necessarily produce better papers than v1, especially when a strong starting template is available. v1 follows well-defined templates, leading to high success rates, while v2 takes a broader, more exploratory approach with lower success rates. v1 works best for tasks with clear objectives and a solid foundation, whereas v2 is designed for open-ended scientific exploration.

Caution! This codebase will execute Large Language Model (LLM)-written code. There are various risks and challenges associated with this autonomy, including the potential use of dangerous packages, uncontrolled web access, and the possibility of spawning unintended processes. Ensure that you run this within a controlled sandbox environment (e.g., a Docker container). Use at your own discretion.

Table of Contents

  1. Requirements
  2. Generate Research Ideas
  3. Run AI Scientist-v2 Paper Generation Experiments
  4. Citing The AI Scientist-v2
  5. Frequently Asked Questions
  6. Acknowledgement

Requirements

This code is designed to run on Linux with NVIDIA GPUs using CUDA and PyTorch.

Installation

# Create a new conda environment
conda create -n ai_scientist python=3.11
conda activate ai_scientist

# Install PyTorch with CUDA support (adjust pytorch-cuda version for your setup)
conda install pytorch torchvision torchaudio pytorch-cuda=12.4 -c pytorch -c nvidia

# Install PDF and LaTeX tools
conda install anaconda::poppler
conda install conda-forge::chktex

# Install Python package requirements
pip install -r requirements.txt

Installation usually takes no more than one hour.

Supported Models and API Keys

OpenAI Models

By default, the system uses the OPENAI_API_KEY environment variable for OpenAI models.

Gemini Models

By default, the system uses the GEMINI_API_KEY environment variable for Gemini models through OpenAI API.

Claude Models via AWS Bedrock

To use Claude models provided by Amazon Bedrock, install the necessary additional packages:

pip install anthropic[bedrock]

Next, configure valid AWS Credentials and the target AWS Region by setting the following environment variables: AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_REGION_NAME.

Semantic Scholar API (Literature Search)

Our code can optionally use a Semantic Scholar API Key (S2_API_KEY) for higher throughput during literature search if you have one. This is used during both the ideation and paper writing stages. The system should work without it, though you might encounter rate limits or reduced novelty checking during ideation. If you experience issues with Semantic Scholar, you can skip the citation phase during paper generation.

Setting API Keys

Ensure you provide the necessary API keys as environment variables for the models you intend to use. For example:

export OPENAI_API_KEY="YOUR_OPENAI_KEY_HERE"
export S2_API_KEY="YOUR_S2_KEY_HERE"
# Set AWS credentials if using Bedrock
# export AWS_ACCESS_KEY_ID="YOUR_AWS_ACCESS_KEY_ID"
# export AWS_SECRET_ACCESS_KEY="YOUR_AWS_SECRET_KEY"
# export AWS_REGION_NAME="your-aws-region"

Generate Research Ideas

Before running the full AI Scientist-v2 experiment pipeline, you first use the ai_scientist/perform_ideation_temp_free.py script to generate potential research ideas. This script uses an LLM to brainstorm and refine ideas based on a high-level topic description you provide, interacting with tools like Semantic Scholar to check for novelty.

  1. Prepare a Topic Description: Create a Markdown file (e.g., my_research_topic.md) describing the research area or theme you want the AI to explore. This file should contain sections like Title, Keywords, TL;DR, and Abstract to define the scope of the research. Refer to the example file ai_scientist/ideas/i_cant_believe_its_not_better.md for the expected structure and content format. Place your file in a location accessible by the script (e.g., the ai_scientist/ideas/ directory).

  2. Run the Ideation Script: Execute the script from the main project directory, pointing it to your topic description file and specifying the desired LLM.

    python ai_scientist/perform_ideation_temp_free.py \
     --workshop-file "ai_scientist/ideas/my_research_topic.md" \
     --model gpt-4o-2024-05-13 \
     --max-num-generations 20 \
     --num-reflections 5
    
    • --workshop-file: Path to your topic description Markdown file.
    • --model: The LLM to use for generating ideas (ensure you have the corresponding API key set).
    • --max-num-generations: How many distinct research ideas to attempt generating.
    • --num-reflections: How many refinement steps the LLM should perform for each idea.
  3. Output: The script will generate a JSON file named after your input Markdown file (e.g., ai_scientist/ideas/my_research_topic.json). This file will contain a list of structured research ideas, including hypotheses, proposed experiments, and related work analysis.

  4. Proceed to Experiments: Once you have the generated JSON file containing research ideas, you can proceed to the next section to run the experiments.

This ideation step guides the AI Scientist towards specific areas of interest and produces concrete research directions to be tested in the main experimental pipeline.

Run AI Scientist-v2 Paper Generation Experiments

Using the JSON file generated in the previous ideation step, you can now launch the main AI Scientist-v2 pipeline. This involves running experiments via agentic tree search, analyzing results, and generating a paper draft.

Specify the models used for the write-up and review phases via command-line arguments. The configuration for the best-first tree search (BFTS) is located in bfts_config.yaml. Adjust parameters in this file as needed.

Key tree search configuration parameters in bfts_config.yaml:

  • agent config:
    • Set num_workers (number of parallel exploration paths) and steps (maximum number of nodes to explore). For example, if num_workers=3 and steps=21, the tree search will explore up to 21 nodes, expanding 3 nodes concurrently at each step.
    • num_seeds: Should generally be the same as num_workers if num_workers is less than 3. Otherwise, set num_seeds to 3.
    • Note: Other agent parameters like k_fold_validation, expose_prediction, and data_preview are not used in the current version.
  • search config:
    • max_debug_depth: The maximum number of times the agent will attempt to debug a failing node before abandoning that search path.
    • debug_prob: The probability of attempting to debug a failing node.
    • num_drafts: The number of initial root nodes (i.e., the number of independent trees to grow) during Stage 1.

Example command to run AI-Scientist-v2 using a generated idea file (e.g., my_research_topic.json). Please review bfts_config.yaml for detailed tree search parameters (the default config includes claude-3-5-sonnet for experiments). Do not set load_code if you do not want to initialize experimentation with a code snippet.

python launch_scientist_bfts.py \
 --load_ideas "ai_scientist/ideas/my_research_topic.json" \
 --load_code \
 --add_dataset_ref \
 --model_writeup o1-preview-2024-09-12 \
 --model_citation gpt-4o-2024-11-20 \
 --model_review gpt-4o-2024-11-20 \
 --model_agg_plots o3-mini-2025-01-31 \
 --num_cite_rounds 20

Once the initial experimental stage is complete, you will find a timestamped log folder inside the experiments/ directory. Navigate to experiments/"timestamp_ideaname"/logs/0-run/ within that folder to find the tree visualization file unified_tree_viz.html. After all experiment stages are complete, the writeup stage begins. The writeup stage typically takes about 20 to 30 minutes in total. Once it finishes, you should see timestamp_ideaname.pdf in the timestamp_ideaname folder. For this example run, all stages typically finish within several hours.

Citing The AI Scientist-v2

If you use The AI Scientist-v2 in your research, please cite our work as follows:

@article{aiscientist_v2,
  title={The AI Scientist-v2: Workshop-Level Automated Scientific Discovery via Agentic Tree Search},
  author={Yamada, Yutaro and Lange, Robert Tjarko and Lu, Cong and Hu, Shengran and Lu, Chris and Foerster, Jakob and Clune, Jeff and Ha, David},
  journal={arXiv preprint arXiv:2504.08066},
  year={2025}
}

Frequently Asked Questions

Why wasn’t a PDF or a review generated for my experiment?

The AI Scientist-v2 completes experiments with a success rate that depends on the chosen foundation model, and the complexity of the idea. Higher success rates are generally observed when using powerful models like Claude 3.5 Sonnet for the experimentation phase.

What is the estimated cost per experiment?

The ideation step cost depends on the LLM used and the number of generations/reflections, but is generally low (a few dollars). For the main experiment pipeline, using Claude 3.5 Sonnet for the experimentation phase typically costs around 15–20 per run. The subsequent writing phase adds approximately $5 when using the default models specified in the example command. Using GPT-4o for model_citation is recommended as it can help reduce writing costs.

How do I run The AI Scientist-v2 for different subject fields?

First, perform the Generate Research Ideas step. Create a new Markdown file describing your desired subject field or topic, following the structure of the example ai_scientist/ideas/i_cant_believe_its_not_better.md. Run the perform_ideation_temp_free.py script with this file to generate a corresponding JSON idea file. Then, proceed to the Run AI Scientist-v2 Paper Generation Experiments step, using this JSON file with the launch_scientist_bfts.py script via the --load_ideas argument.

What should I do if I have problems accessing the Semantic Scholar API?

The Semantic Scholar API is used to assess the novelty of generated ideas and to gather citations during the paper write-up phase. If you don’t have an API key, encounter rate limits, you may be able to skip these phases.

I encountered a “CUDA Out of Memory” error. What can I do?

This error typically occurs when the AI Scientist-v2 attempts to load or run a model that requires more GPU memory than available on your system. To resolve this, you can try updating your ideation prompt file (ai_scientist/ideas/my_research_topic.md) to suggest using smaller models for the experiments.

Acknowledgement

The tree search component implemented within the ai_scientist directory is built on top of the AIDE project. We thank the AIDE developers for their valuable contributions and for making their work publicly available.

Star History

Star History Chart

⚖️ License & Responsible Use

This project is licensed under The AI Scientist Source Code License (a derivative of the Responsible AI License).

Mandatory Disclosure: By using this code, you are legally bound to clearly and prominently disclose the use of AI in any resulting scientific manuscripts or papers.

We recommend the following attribution in your paper’s Abstract or Methods section:

“This manuscript was autonomously generated using The AI Scientist.”

相似文章

@gaoren7716: 写论文这件事,可能要被流程化 AI 系统重写了 不是帮你润色,不是帮你改一句话,而是从选题开始就有 13 个 Agent 在协作 功能清单: Deep Research(13-agent 调研团队) Systematic Review(P…

X AI KOLs Timeline

介绍了一套名为Academic Research Skills的开源AI工具,通过13个Agent协作实现从选题到写作、审稿的全流程学术研究自动化,可作为Claude Code插件使用,将学术研究变为标准化生产线。

@vintcessun: 终于有人把科研AI辅助那条断头路修通了。写论文、做文献报告、出仿真图再到答辩PPT,过去每一步都得手动倒腾数据和复制结果,现在三个Skill把全流程串起来了——从scientific-toolkit算数据出图,直接喂给research-w…

X AI KOLs Timeline

报道一个科研AI辅助工具,通过三个Skill(scientific-toolkit、research-writing、office-academic)打通科研全流程,从数据计算到论文写作再到PPT制作,支持在Claude Code和Codex中一键安装,中文优先。

迈向AI研究的端到端自动化

arXiv cs.AI

一篇介绍AI科学家(The AI Scientist)的论文,该系统自动化了从想法生成到同行评审的整个研究生命周期,展示了人工智能在科学贡献方面日益增长的能力。

@PierceZhang34: 分享一个专注于 AI 辅助科研的开放共建仓库 Awesome Vibe Research 项目核心目标它收集和沉淀科研全流程(从想法生成到论文发表、传播)中可复用、可验证、可演化的 AI 辅助组件,包括: Agents(智能体) Skil…

X AI KOLs Timeline

分享了一个由 ModelScope 维护的开放共建仓库 Awesome Vibe Research,该仓库收集并沉淀了科研全流程中可复用、可验证、可演化的 AI 辅助组件,包括智能体、技能包、工作流、工具和最佳实践,旨在帮助科研人员和开发者利用 AI 提升研究效率。