Tag
A blog post explaining continuous batching, a technique for improving LLM serving throughput by dynamically adding new requests to a batch as old ones finish, keeping the GPU busy and reducing idle time.