ChemAmp: Amplified Chemistry Tools via Composable Agents

arXiv cs.CL Papers

Summary

ChemAmp introduces a tool amplification paradigm that dynamically coordinates specialized chemistry tools (UniMol2, Chemformer) as composable agents to enhance performance on molecular tasks. The framework outperforms chemistry-specialized models and reduces inference token costs by 94% compared to vanilla multi-agent systems.

arXiv:2505.21569v3 Announce Type: replace-cross Abstract: Although LLM-based agents are proven to master tool orchestration in scientific fields, particularly chemistry, their single-task performance remains limited by underlying tool constraints. To this end, we propose tool amplification, a novel paradigm that enhances the collective capabilities of specialized tools through optimized, dynamic coordination within individual tasks. Instantiating this paradigm, we introduce ChemAmp, a computationally lightweight framework that dynamically treats chemistry tools (e.g., UniMol2, Chemformer) as composable building-block agents. It constructs task-specialized super-agents that transcend atomic tool constraints with limited data (≤10 samples). Our evaluations across four core chemistry tasks—molecular design, molecule captioning, reaction prediction, and property prediction—demonstrate that ChemAmp outperforms chemistry-specialized models, generalist LLMs, and agent systems with tool orchestration. Critically, this bottom-up construction strategy enables 94% inference token cost reductions versus vanilla multi-agent systems.
Original Article Export to Word Export to PDF
View Cached Full Text

Cached at: 04/20/26, 08:33 AM

# ChemAmp: Amplified Chemistry Tools via Composable Agents
Source: https://arxiv.org/html/2505.21569
Zhucong Li1,3,Powei Chang211footnotemark:1,Jin Xiao2,Zhijian Zhou1, Qianyu He3,Jiaqing Liang2,Fenglei Cao4,Xu Yinghui1,Yuan Qi1,4,5 1Artificial Intelligence Innovation and Incubation Institute, Fudan University, 2School of Data Science, Fudan University, 3College of Computer Science and Artificial Intelligence, Fudan University, 4Shanghai Academy of Artificial Intelligence for Science, 5Department of Information and Intelligence Development, Zhongshan Hospital, Fudan University \{zcli22, bwzhang24, jinxiao23\}@m\.fudan\.edu\.cn, \{liangjiaqing, xuyinghui, qiyuan\}@fudan\.edu\.cn

###### Abstract

Although LLM-based agents are proven to master tool orchestration in scientific fields, particularly chemistry, their single-task performance remains limited by underlying tool constraints. To this end, we propose tool amplification, a novel paradigm that enhances the collective capabilities of specialized tools through optimized, dynamic coordination within individual tasks. Instantiating this paradigm, we introduce ChemAmp, a computationally lightweight framework that dynamically treats chemistry tools (e.g., UniMol2, Chemformer) as composable building-block agents. It constructs task-specialized super-agents that transcend atomic tool constraints with limited data (≤10 samples). Our evaluations across four core chemistry tasks—molecular design, molecule captioning, reaction prediction, and property prediction—demonstrate that ChemAmp outperforms chemistry-specialized models, generalist LLMs, and agent systems with tool orchestration. Critically, this bottom-up construction strategy enables 94% inference token cost reductions versus vanilla multi-agent systems. Our code and dataset are publicly available at https://github.com/Chang-pw/ChemAmp.

ChemAmp: Amplified Chemistry Tools via Composable Agents

Zhucong Li1,3††thanks:Equal contribution., Powei Chang211footnotemark:1, Jin Xiao2, Zhijian Zhou1,Qianyu He3,Jiaqing Liang2,Fenglei Cao4,Xu Yinghui1,Yuan Qi1,4,5††thanks:Corresponding author.1Artificial Intelligence Innovation and Incubation Institute, Fudan University,2School of Data Science, Fudan University,3College of Computer Science and Artificial Intelligence, Fudan University,4Shanghai Academy of Artificial Intelligence for Science,5Department of Information and Intelligence Development, Zhongshan Hospital, Fudan University\{zcli22, bwzhang24, jinxiao23\}@m\.fudan\.edu\.cn, \{liangjiaqing, xuyinghui, qiyuan\}@fudan\.edu\.cn

## 1 Introduction

Refer to captionFigure 1:The research framework. This work introduces a paradigm shift—from tool orchestration (managing tool sequences across tasks) to tool amplification (dynamically enhancing capabilities within atomic tasks).

Large Language Model (LLM)-based agents have emerged as powerful tools for automating complex scientific workflows, particularly in chemistry, where they orchestrate multi-step processes involving specialized computational tools.

As shown in Fig. 1 (Left), while effective for task orchestration, current works primarily sequence pre-defined tool usage. For example, when an LLM-based agent handles three potential pipeline tasks such as molecular design, reaction prediction, and property prediction, it sequentially selects the predefined chemistry tools for each task to obtain computational results. However, a critical limitation remains unaddressed: the performance of these agents on individual, complex tasks is fundamentally constrained by the inherent capabilities and scope of the underlying tools they invoke. As a result, errors can propagate through the reasoning chain. This bottleneck restricts performance gains and often leads to inefficient, redundant tool calls that inflate computational costs.

In this work, we confront a more fundamental limitation: even state-of-the-art chemistry tools (e.g., UniMol2, Chemformer, ChemDFM) remain constrained by their atomic capabilities when operating in isolation. As illustrated in Fig. 1 (Right), we posit a fundamental shift from tool orchestration to tool amplification where tools transcend native capabilities through hierarchical coordination within atomic tasks. Unlike traditional tool orchestration (which schedules tools across different tasks), amplification focuses on making tools work better together within a single task. By dynamically combining tools into collaborative teams, they can achieve capabilities beyond what any tool can do alone.

This amplification paradigm introduces two core challenges: (1) Structural Search Complexity: Optimal agent-tool compositions vary significantly across chemistry tasks due to domain-specific constraints. (2) Efficiency-Capability Tradeoff: Manual composition is infeasible, while exhaustive search incurs prohibitive computational costs.

To address these challenges, we introduce ChemAmp (Chemistry Tool Amplification), a framework realizing tool amplification through composable agents. Departing from naive stacking, ChemAmp employs a bi-phase encapsulation engine—constructing agent hierarchies from atomic tools via bottom-up iterative composition. Specifically, ChemAmp operates through two synergistic stages: (1) Atomic-to-Composite Amplification: Atomic tools undergo iterative encapsulation into sub-agents, evolving optimal combinations through adaptive scoring and automated feedback. (2) Cross-Composite Synergy: Sub-agents are further encapsulated into composite networks through the same mechanism, amplifying capabilities via emergent hierarchical coordination.

We deploy ChemAmp across four foundational chemistry domains: molecular design, molecule captioning, property prediction, and reaction prediction. For each task, ChemAmp discovers optimal agent compositions, dynamically constructing task-specialized super-agents that transcend atomic tool constraints with limited data (≤10 samples). Experimental results demonstrate the consistent superiority of ChemAmp over three critical baselines: chemistry-specialized models, generalist LLMs, and agent systems with tool orchestration. Notably, ChemAmp delivers significant gains while requiring only 6% inference token costs versus basic multi-agent systems, validating both the efficacy and efficiency of tool amplification. This paradigm shift enables ChemAmp to automatically construct task-specialized super-agents that amplify tool capabilities.

To summarize, our contributions are three-fold:

- We propose tool amplification, a novel paradigm that enhances the collective capabilities of specialized tools through optimized, dynamic coordination within individual tasks.
- We develop ChemAmp, a lightweight framework realizing amplification via bi-phase iterative encapsulation of atomic tools into agent composite tools.
- We demonstrate state-of-the-art results across molecular design, captioning, reaction prediction, property prediction and 94% inference token cost reductions versus vanilla multi-agent systems.

Refer to captionFigure 2:ChemAmp's two-stage amplification process: (1) Atomic-to-Composite Amplification: atomic tools are encapsulated into agents; (2) Cross-Composite Synergy: agents are merged into hierarchical networks.

## 2 Related Work

### 2.1 Orchestration Paradigm

The foundation for computational chemistry agents builds on the ReAct framework, which integrates reasoning with tool execution. This paradigm has evolved into sophisticated tool orchestration systems: (1) Workflow-focused: ChemCrow and Coscientist sequence tools like RDKit/LabX across multi-step tasks. (2) Domain-optimized: ChemAgent and SciToolAgent improve LLM performance in complex chemical reasoning tasks by introducing a dynamic, self-evolving memory library that supports task decomposition and solution generation. ChemToolAgent supports a large tool set and performs dynamic tool selection in a broad task suite. (3) Hybrid optimization: Recent systems like AgentPrune, GPTSwarm, Aflow, and MaAS automate workflow refinement after initial manual design. As a generic optimization of manually orchestrated vanilla multi-agent systems, such frameworks deliberately avoid atomic tool-level performance enhancements. Consequently, our experiments compare against multiple instantiations of vanilla multi-agent systems rather than these derivative approaches.

### 2.2 Paradigm Shift Imperative

Despite these advances, orchestration faces fundamental constraints: (1) Capability Ceiling: Exact accuracy plateaus at 35% (e.g., molecule captioning). (2) Coordination Scope: They remain inefficient in utilizing existing computational chemistry tools, and struggle to navigate the combinatorial and hierarchical relationships between these tools.

This impasse necessitates a paradigm shift: from scheduling tools across tasks (orchestration) to amplifying capabilities within tasks through dynamic agent composition. The amplification paradigm introduces two core challenges: (1) Structural Search Complexity: Optimal agent-tool compositions vary significantly across chemistry tasks due to domain-specific constraints. (2) Efficiency-Capability Tradeoff: Manual composition is infeasible, while exhaustive search incurs prohibitive computational costs.

## 3 ChemAmp Framework

Our study proposes ChemAmp, a framework that realizes the tool amplification paradigm through hierarchical composition of composable agents. ChemAmp dynamically constructs agent hierarchies, where atomic tools are iteratively encapsulated into sub-agents and further merged into composite networks. This two-stage process (Fig. 2) amplifies capabilities via emergent coordination while minimizing task-specific errors through iterative refinement.

Central to this approach is the Agent Composite Tool 𝒜(t₁,...,tₙ), which serves dual roles: as a composable building block for higher-level agents and as an autonomous executor of chemistry sub-tasks. This duality enables ChemAmp to identify optimal capability enhancement points where tool coordination generates synergistic effects beyond individual functions.

### 3.1 Stage 1: Atomic-to-Composite Amplification

To reduce prediction errors of base tools and enrich the Tool Set for Stage 2—thereby constructing a more effective search space for hierarchical coordination—we initiate a process as the warm-up phase. Given an initial tool set 𝒯 provided by a large language model and parameters k, we construct a global Tool Library ℒ through Algorithm 1. This process begins by applying Atomic-to-Composite Amplification to each atomic tool tₖ ∈ 𝒯, where we iteratively construct Agent Composite Tools 𝒜ᵢ(tₖ) through layered encapsulation. Each composite tool's performance is evaluated via task-specific metrics to obtain capability enhancement scores sᵢ, with progression to the next reinforcement layer 𝒜ᵢ₊₁(tₖ, 𝒜ᵢ) contingent on sᵢ surpassing the previous layer's score sᵢ₋₁ by a significance threshold δ.

The iterative refinement continues until performance plateaus—defined as Δs < δ for consecutive iterations—ensuring optimal capability enhancement is achieved. Upon termination, all reinforced Agent Composite Tools 𝒜ᵢ₊₁(tₖ, 𝒜ᵢ) alongside original base tools are registered in ℒ, creating an enriched library containing context-adapted tool variants, multi-layer composites with emergent functionalities, and error-corrected versions minimizing prediction inaccuracies. This output ℒ serves as the fundamental building block repository for Stage 2's hierarchical coordination, enabling the construction of task-optimized tool networks.

Algorithm 1 ChemAmp
- Input: Initial Tool Set 𝒯, Parameter k, Tool Library ℒ ← ∅
- Output: Best Agent Composite Tool 𝒜*
- [Stage 1: Atomic-to-Composite Amplification]
- foreach t ∈ 𝒯 do
  - repeat
    - Build and validate agent composite tool 𝒜ₙ(t, 𝒜ₙ₋₁(t))
  - until no score improvement
  - ℒ ← ℒ ∪ {t, 𝒜₁, ..., 𝒜ₙ}
- endfor
- [Stage 2: Cross-Composite Synergy]
- while global performance improvement do
  - Sort ℒ by performance; pick t₁ and top-k
  - Build and validate {𝒜₁(t₁, t₂), ..., 𝒜ₖ(t₁, tₖ)}
  - ℒ ← ℒ ∪ {t₁, ..., tₖ, 𝒜₁, ..., 𝒜ₖ}
- endwhile

### 3.2 Stage 2: Cross-Composite Synergy

After Stage 1, we obtain a richer tool library ℒ for performing hierarchical reinforcement through stacking. In this stage, the primary focus is on combining and stacking tools from the library ℒ to further enhance their performance. First, sort the tool library ℒ, and the top performing tool (top 1) is selected as the mandatory base tool t₁. It is then combined with the remaining top-k tools t_{topk} = {t₂, t₃, ..., tₖ} to form

Similar Articles

AI lets chemists design molecules by simply describing them

Reddit r/singularity

EPFL researchers developed Synthegy, an AI framework that uses large language models to guide chemical retrosynthesis and reaction mechanism analysis through natural language instructions, significantly improving strategic planning for chemists.

COMPOSITE-Stem

arXiv cs.CL

COMPOSITE-STEM introduces a benchmark of 70 expert-curated agentic tasks across physics, biology, chemistry, and mathematics, designed to evaluate AI agents on scientific workflows beyond saturated benchmarks. The top-performing model (Claude Opus 4.6) achieves only 21.4%, demonstrating significant capability gaps in scientific reasoning.

Writing effective tools for agents — with agents

Anthropic Engineering

Anthropic shares engineering best practices for designing, evaluating, and optimizing tools for AI agents, specifically utilizing the Model Context Protocol (MCP) and Claude Code to improve agent performance.