Agentic Discovery of Neural Architectures: AIRA-Compose and AIRA-Design
Summary
This paper introduces AIRA-Compose and AIRA-Design, dual frameworks using AI agents to autonomously discover neural architectures that outperform standard Transformers and scale efficiently.
View Cached Full Text
Cached at: 05/18/26, 02:23 AM
Paper page - Agentic Discovery of Neural Architectures: AIRA-Compose and AIRA-Design
Source: https://huggingface.co/papers/2605.15871
Abstract
AI agents autonomously design foundation models exceeding standard Transformers through dual frameworks that optimize both architectural search and mechanistic implementation, achieving superior performance and efficiency.
Toward recursive self-improvement, we investigateLLM agentsautonomously designingfoundation modelsbeyond standard Transformers. We introduce a dual-framework approach:AIRA-Composefor high-levelarchitecture search, andAIRA-Designfor low-levelmechanistic implementation.AIRA-Composeuses 11 agents to explore fundamental computational primitives under a 24-hour budget. Agents evaluate million-parameter candidates, extrapolating top designs to 350M, 1B, and 3B scales. This yields 14 architectures across two families: AIRAformers (Transformer-based) and AIRAhybrids (Transformer-Mamba). Pre-trained at 1B scale, these consistently outperform Llama 3.2 and Composer-found baselines. Ondownstream tasks, AIRAformer-D and AIRAhybrid-D improve accuracy by 2.4% and 3.8% over Llama 3.2. Furthermore,AIRA-Composefinds models with highly efficientscaling frontiers: AIRAformer-C scales 54% and 71% faster than Llama 3.2 and Composer’s best Transformer, while AIRAhybrid-C outscales Nemotron-2 by 23% and Composer’s best hybrid by 37%.AIRA-Designtasks 20 agents with writing novelattention mechanismsfor long-range dependencies and high-performing training scripts. On theLong Range Arenabenchmark, agent-designed architectures reach within 2.3% and 2.6% of human state-of-the-art on document matching and text classification. On theAutoresearchbenchmark, Greedy Opus 4.5 achieves 0.968 validationbits-per-byteunder a fixed time budget, surpassing the published minimum. Together, these frameworks show AI agents can autonomously discover architectures and algorithmic optimizations matching or surpassing hand-designed baselines. This establishes a powerful paradigm for discovering next-generationfoundation models, marking a clear step toward recursive self-improvement.
View arXiv pageView PDFAdd to collection
Get this paper in your agent:
hf papers read 2605\.15871
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2605.15871 in a model README.md to link it from this page.
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2605.15871 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2605.15871 in a Space README.md to link it from this page.
Collections including this paper0
No Collection including this paper
Add this paper to acollectionto link it from this page.
Similar Articles
@Kangwook_Lee: https://x.com/Kangwook_Lee/status/2052925157606568217
The author argues that human-designed structural frameworks for AI agents should be replaced by AI-engineered ones, introducing a Three Regimes Framework to show how this shift unlocks mid-sized model capabilities. Citing projects like Meta Harness, they predict an imminent transition where AI will autonomously optimize its own system architecture.
@anyscalecompute: Most agent frameworks solve orchestration and leave infrastructure completely unresolved. New blog: production-ready AI…
Anyscale published a technical guide on deploying production-ready AI agents using Ray Serve, MCP, and A2A protocols. The article addresses common infrastructure bottlenecks by proposing a decoupled microservices architecture that enables independent scaling of LLMs, tools, and agents.
Neurodata Without Boredom: Benchmarking Agentic AI for Data Reuse
This paper benchmarks agentic AI systems on the task of loading, understanding, and reformatting fragmented neuroscience data, finding that while agents perform well on subtasks, they rarely achieve fully error-free end-to-end solutions and human oversight remains necessary.
@dair_ai: https://x.com/dair_ai/status/2053495521243799717
DAIR AI's weekly roundup highlights top research papers including HeavySkill, which improves model performance via internalized parallel reasoning, and Sakana AI's Conductor, which uses RL to optimize agent orchestration. It also covers Meta FAIR's work on self-improving pretraining.
After using AI agents for a few months, these are my biggest observations
A personal reflection on the transformative potential of AI agents with persistent memory, arguing that context and workflow organization will become more important than the models themselves.