Tag
MegaTrain enables full-precision training of 100B+ LLMs on a single GPU by treating VRAM as a transient stateless cache, inverting the memory hierarchy.
OPD-Evolver proposes a self-evolving agent framework using slow-fast co-evolution and on-policy self-distillation to enhance memory management and policy learning, outperforming existing methods like ReasoningBank and Skill0 across multi-domain benchmarks.
The author describes a pattern where worker agents emit structured memory events instead of writing directly to shared memory, using a Memory Curator to validate, deduplicate, and route them to appropriate scopes, aiming to prevent memory pollution in multi-agent systems. They compare this approach to existing frameworks and solicit community feedback.
The article discusses how the KV cache is evolving into a memory hierarchy for LLM inference, optimizing memory management during decoding.
TTKV introduces a temporal-tiered KV cache that mimics human memory to cut 128K-context LLM inference latency by 76% and double throughput while reducing cross-tier traffic 5.94×.