Tag
Based on the SGLang Omni team's internal decision-making article, this post introduces the operating principles of LLM inference systems in an accessible way, starting from basic concepts such as autoregressive decoding, KV cache, and continuous batching.
A glossary from TechCrunch that defines common AI terms such as AGI, AI agents, API endpoints, and chain of thought, updated regularly as the field evolves.
KV cache stores previously computed key and value vectors during autoregressive generation, allowing models to avoid recomputing the entire sequence at each step, significantly speeding up inference at the cost of increased memory usage.
Google DeepMind chip engineer Reiner Pope delivers a comprehensive whiteboard explanation of how chips work, covering logic gates to systolic arrays and the human brain, in a free YouTube video.
A free interactive tool called Transformer Explainer runs a live GPT-2 model in the browser, visualizing the internal workings of Transformers with a Sankey diagram and live inference.
A plain-language guide explaining what AI agents are, how they differ from chatbots, their autonomous decision-making loop, and why they are reshaping industries in 2026.