Tag
The article shares a personal experience using Cursor and Opus 4.7 to generate videos, highlighting the agent's rigorous self-correction process. It then transitions into a technical discussion on 'Agent = Model + Harness,' arguing that engineering systems like ratchets and context management are more critical to AI agent performance than the underlying model alone.
The author describes improving AI agent reliability by replacing a single general-purpose agent with a four-agent workflow specializing in intake, research, action, and review. This shift prioritized system predictability and easier debugging over raw autonomy.
This paper introduces TIDE, a method that addresses the Rare Token and Contextual Collapse problems in LLMs by injecting token identity into every layer via Embedding Memory. The authors demonstrate theoretical and empirical improvements across language modeling and downstream tasks.
A social media post discusses the technical implication of applying RoPE rotation directly to KV caches, noting that it leaks positional information into the value matrix V.
This interactive tool visualizes the mathematical underpinnings of transformer models through dataflow graphs, covering architectures from GPT-2 to Qwen 3.6 and various attention mechanisms.
StageMem proposes a lifecycle-managed memory framework for language models that organizes memory into transient, working, and durable stages with explicit confidence and strength metrics, treating memory as a stateful process rather than a static store to better manage retention and forgetting under bounded capacity.
Stanford University offers a 1.5-hour lecture on LLM architecture covering fundamental concepts and design principles of large language models.
Hugging Face blog post explaining Mixture of Experts (MoEs) architecture in Transformers, covering the shift from dense to sparse models, weight loading optimizations, expert parallelism, and training techniques for MoE-based language models.