Agentic Discovery of Neural Architectures: AIRA-Compose and AIRA-Design

Hugging Face Daily Papers Papers

Summary

This paper introduces AIRA-Compose and AIRA-Design, dual frameworks using AI agents to autonomously discover neural architectures that outperform standard Transformers and scale efficiently.

Toward recursive self-improvement, we investigate LLM agents autonomously designing foundation models beyond standard Transformers. We introduce a dual-framework approach: AIRA-Compose for high-level architecture search, and AIRA-Design for low-level mechanistic implementation. AIRA-Compose uses 11 agents to explore fundamental computational primitives under a 24-hour budget. Agents evaluate million-parameter candidates, extrapolating top designs to 350M, 1B, and 3B scales. This yields 14 architectures across two families: AIRAformers (Transformer-based) and AIRAhybrids (Transformer-Mamba). Pre-trained at 1B scale, these consistently outperform Llama 3.2 and Composer-found baselines. On downstream tasks, AIRAformer-D and AIRAhybrid-D improve accuracy by 2.4% and 3.8% over Llama 3.2. Furthermore, AIRA-Compose finds models with highly efficient scaling frontiers: AIRAformer-C scales 54% and 71% faster than Llama 3.2 and Composer's best Transformer, while AIRAhybrid-C outscales Nemotron-2 by 23% and Composer's best hybrid by 37%. AIRA-Design tasks 20 agents with writing novel attention mechanisms for long-range dependencies and high-performing training scripts. On the Long Range Arena benchmark, agent-designed architectures reach within 2.3% and 2.6% of human state-of-the-art on document matching and text classification. On the Autoresearch benchmark, Greedy Opus 4.5 achieves 0.968 validation bits-per-byte under a fixed time budget, surpassing the published minimum. Together, these frameworks show AI agents can autonomously discover architectures and algorithmic optimizations matching or surpassing hand-designed baselines. This establishes a powerful paradigm for discovering next-generation foundation models, marking a clear step toward recursive self-improvement.
Original Article
View Cached Full Text

Cached at: 05/18/26, 02:23 AM

Paper page - Agentic Discovery of Neural Architectures: AIRA-Compose and AIRA-Design

Source: https://huggingface.co/papers/2605.15871

Abstract

AI agents autonomously design foundation models exceeding standard Transformers through dual frameworks that optimize both architectural search and mechanistic implementation, achieving superior performance and efficiency.

Toward recursive self-improvement, we investigateLLM agentsautonomously designingfoundation modelsbeyond standard Transformers. We introduce a dual-framework approach:AIRA-Composefor high-levelarchitecture search, andAIRA-Designfor low-levelmechanistic implementation.AIRA-Composeuses 11 agents to explore fundamental computational primitives under a 24-hour budget. Agents evaluate million-parameter candidates, extrapolating top designs to 350M, 1B, and 3B scales. This yields 14 architectures across two families: AIRAformers (Transformer-based) and AIRAhybrids (Transformer-Mamba). Pre-trained at 1B scale, these consistently outperform Llama 3.2 and Composer-found baselines. Ondownstream tasks, AIRAformer-D and AIRAhybrid-D improve accuracy by 2.4% and 3.8% over Llama 3.2. Furthermore,AIRA-Composefinds models with highly efficientscaling frontiers: AIRAformer-C scales 54% and 71% faster than Llama 3.2 and Composer’s best Transformer, while AIRAhybrid-C outscales Nemotron-2 by 23% and Composer’s best hybrid by 37%.AIRA-Designtasks 20 agents with writing novelattention mechanismsfor long-range dependencies and high-performing training scripts. On theLong Range Arenabenchmark, agent-designed architectures reach within 2.3% and 2.6% of human state-of-the-art on document matching and text classification. On theAutoresearchbenchmark, Greedy Opus 4.5 achieves 0.968 validationbits-per-byteunder a fixed time budget, surpassing the published minimum. Together, these frameworks show AI agents can autonomously discover architectures and algorithmic optimizations matching or surpassing hand-designed baselines. This establishes a powerful paradigm for discovering next-generationfoundation models, marking a clear step toward recursive self-improvement.

View arXiv pageView PDFAdd to collection

Get this paper in your agent:

hf papers read 2605\.15871

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2605.15871 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2605.15871 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2605.15871 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

Similar Articles

@Kangwook_Lee: https://x.com/Kangwook_Lee/status/2052925157606568217

X AI KOLs Timeline

The author argues that human-designed structural frameworks for AI agents should be replaced by AI-engineered ones, introducing a Three Regimes Framework to show how this shift unlocks mid-sized model capabilities. Citing projects like Meta Harness, they predict an imminent transition where AI will autonomously optimize its own system architecture.

Neurodata Without Boredom: Benchmarking Agentic AI for Data Reuse

arXiv cs.LG

This paper benchmarks agentic AI systems on the task of loading, understanding, and reformatting fragmented neuroscience data, finding that while agents perform well on subtasks, they rarely achieve fully error-free end-to-end solutions and human oversight remains necessary.

@dair_ai: https://x.com/dair_ai/status/2053495521243799717

X AI KOLs Following

DAIR AI's weekly roundup highlights top research papers including HeavySkill, which improves model performance via internalized parallel reasoning, and Sakana AI's Conductor, which uses RL to optimize agent orchestration. It also covers Meta FAIR's work on self-improving pretraining.