Tag
The article argues that most 'agentic' systems are actually single agents with tools, highlighting the high costs and complexity of multi-agent setups. It outlines three valid multi-agent patterns—orchestrator-worker, pipeline, and peer-to-peer—and provides criteria for deciding when to use them versus a single agent.
The article critiques the overuse of the term 'multi-agent orchestration,' arguing that many implementations are simply single agents using function calls rather than true distributed systems. It highlights practical, production-tested patterns like sequential pipelines and human-in-the-loop workflows as alternatives to complex but ineffective architectures.
The article argues that the next major AI debate should focus on representation and institutional architecture, proposing three layers (Sense, Core, Driver) to address how AI systems capture reality, reason, and act legitimately, rather than just model intelligence.
The article argues that AI subagents should not automatically inherit their parent agent's full permissions, advocating instead for attenuated delegation with explicit scope, tool limits, and audit trails to improve security in multi-agent systems.
The author details their decision to exclude LLMs from generating final fact-check verdicts in favor of a hybrid architecture that uses LLMs for data extraction and a deterministic Python layer for scoring, citing issues with stochastic instability and auditability.
This post outlines a comprehensive 9-layer AI production architecture, emphasizing components like RAG pipelines, security guards, observability, and evaluation to distinguish robust production systems from simple demos.
The article discusses how Addy Osmani argues that the performance difference between AI coding agents like Claude Code, Cursor, and Cline stems from their 'Harness'—the layer of prompts, tools, and constraints around the model—rather than the underlying model itself. It details best practices for harness engineering, including hooks, sandboxing, and context management, to bridge the gap between model capability and actual agent performance.
The article discusses Andrej Karpathy's 'LLM Wiki' concept as a paradigm shift from traditional RAG, arguing that maintaining a persistent, evolving knowledge substrate allows for compounding understanding rather than stateless retrieval.
The author expresses frustration with the industry's reliance on prompt engineering and scaling to fix logical reasoning deficits in transformer-based LLMs, arguing that these probabilistic models fundamentally lack the architecture for deterministic logic.
A tutorial blog post explaining LLM Routing — the practice of directing user queries to the most appropriate LLM based on cost, latency, and quality. Covers routing strategies, anatomy of an LLM router, and comparisons with Mixture of Experts.
A detailed breakdown of a 9-layer production AI architecture covering RAG pipeline, agents, prompts, security, evaluation, and observability layers.
Sanja Fidler, VP of AI Research at NVIDIA and head of the company’s spatial-intelligence lab, says the Transformer’s Achilles heel is clear: training costs are sky-high and the hunger for data is bottomless. A new architectural breakthrough is overdue, and next-gen variants are already emerging.
Position paper proposes a “continuity layer” that preserves what models learn over time, introducing Decomposed Trace Convergence Memory and the ATANT benchmark to measure 100% isolated, 96% cumulative recall on a 250-story corpus without language models in the loop.
This paper analyzes 935 ablation experiments from 161 publications to show that AI architectural evolution follows the same statistical laws as biological evolution, including heavy-tailed fitness effect distributions and punctuated equilibria dynamics. The findings suggest that evolutionary statistical structure is substrate-independent, determined by fitness landscape topology rather than the mechanism of selection.