gpu-internals

Tag

Cards List
#gpu-internals

An open handbook on LLM inference at scale (GPU internals, KV cache, batching, vLLM/SGLang/TensorRT-LLM) [P]

Reddit r/MachineLearning · 3d ago

An open, in-progress handbook explaining LLM inference internals including GPU memory hierarchy, KV cache, batching, and popular inference engines like vLLM and TensorRT-LLM.

0 favorites 0 likes
← Back to home

Submit Feedback