Tag
An open, in-progress handbook explaining LLM inference internals including GPU memory hierarchy, KV cache, batching, and popular inference engines like vLLM and TensorRT-LLM.
A tweet recommends the Language AI Handbook, a free online resource that covers LLM components from classical NLP to modern transformers, quantization, RL, and safety.