explainer

#explainer

@grapeot: How does the LLM inference system actually work? The SGLang Omni team recently published a rare article that lays out the complete decision-making chain of a top inference system team. I followed the original text and organized a popular science post, starting from autoregressive decoding, KV cache, continuous batching...

X AI KOLs Timeline ↗ · 4d ago

Based on the SGLang Omni team's internal decision-making article, this post introduces the operating principles of LLM inference systems in an accessible way, starting from basic concepts such as autoregressive decoding, KV cache, and continuous batching.

0 favorites 0 likes

#explainer

So you’ve heard these AI terms and nodded along; let’s fix that

TechCrunch AI ↗ · 5d ago Cached

A glossary from TechCrunch that defines common AI terms such as AGI, AI agents, API endpoints, and chain of thought, updated regularly as the field evolves.

0 favorites 0 likes

#explainer

@TheTuringPost: Why KV cache is one of the main reasons LLMs are fast? KV cache is what connects attention mechanism with generation st…

X AI KOLs Timeline ↗ · 2026-05-25 Cached

KV cache stores previously computed key and value vectors during autoregressive generation, allowing models to avoid recomputing the entire sequence at each step, significantly speeding up inference at the cost of increased memory usage.

0 favorites 0 likes

#explainer

@Nona_xai: Google DeepMind chip engineer Reiner Pope just explained on a whiteboard what no one had ever explained to you before: …

X AI KOLs Timeline ↗ · 2026-05-23 Cached

Google DeepMind chip engineer Reiner Pope delivers a comprehensive whiteboard explanation of how chips work, covering logic gates to systolic arrays and the human brain, in a free YouTube video.

0 favorites 0 likes

#explainer

@AlphaSignalAI: This free interactive explainer just exposed how GPT actually works. Most people treat Transformers like magic. You typ…

X AI KOLs Timeline ↗ · 2026-05-17 Cached

A free interactive tool called Transformer Explainer runs a live GPT-2 model in the browser, visualizing the internal workings of Transformers with a Sankey diagram and live inference.

0 favorites 0 likes

#explainer

What Is an AI Agent? The Plain-Language Guide to the Technology Reshaping Every Industry in 2026

Reddit r/ArtificialInteligence ↗ · 2026-05-08 Cached

A plain-language guide explaining what AI agents are, how they differ from chatbots, their autonomous decision-making loop, and why they are reshaping industries in 2026.

0 favorites 0 likes

explainer

So you’ve heard these AI terms and nodded along; let’s fix that

@TheTuringPost: Why KV cache is one of the main reasons LLMs are fast? KV cache is what connects attention mechanism with generation st…

@Nona_xai: Google DeepMind chip engineer Reiner Pope just explained on a whiteboard what no one had ever explained to you before: …

@AlphaSignalAI: This free interactive explainer just exposed how GPT actually works. Most people treat Transformers like magic. You typ…

What Is an AI Agent? The Plain-Language Guide to the Technology Reshaping Every Industry in 2026

Submit Feedback