continuous-batching

Tag

Cards List
#continuous-batching

@pallavishekhar_: Continuous Batching in LLMs Read here: https://outcomeschool.com/blog/continuous-batching-in-llms…

X AI KOLs Timeline · 6h ago Cached

A blog post explaining continuous batching, a technique for improving LLM serving throughput by dynamically adding new requests to a batch as old ones finish, keeping the GPU busy and reducing idle time.

0 favorites 0 likes
#continuous-batching

@neural_avb: Very cool intro to LLM serving, basics of inference, and VLLM (paged attention, continuous batching etc) Highly recomme…

X AI KOLs Timeline · 5d ago Cached

Recommends an introduction to LLM serving, inference basics, and VLLM, covering paged attention and continuous batching.

0 favorites 0 likes
#continuous-batching

@SergioPaniego: continuous batching just landed in TRL for GRPO at 64 generations it runs faster and uses less VRAM than plain generate…

X AI KOLs Following · 2026-06-19 Cached

Continuous batching has been added to TRL for GRPO, improving speed and VRAM usage without needing vLLM. The tweet explains how it works and when to use it.

0 favorites 0 likes
#continuous-batching

@steeve: another 5 days later, zml/llmd runs fully on Metal, serving 8 simultaneous requests at full bf16 zml/llmd is our LLM se…

X AI KOLs Following · 2026-06-13 Cached

zml/llmd now runs fully on Apple's Metal API, serving 8 simultaneous requests at full bf16 precision, with continuous batching and other modern features.

0 favorites 0 likes
#continuous-batching

@grapeot: How does the LLM inference system actually work? The SGLang Omni team recently published a rare article that lays out the complete decision-making chain of a top inference system team. I followed the original text and organized a popular science post, starting from autoregressive decoding, KV cache, continuous batching...

X AI KOLs Timeline · 2026-05-30

Based on the SGLang Omni team's internal decision-making article, this post introduces the operating principles of LLM inference systems in an accessible way, starting from basic concepts such as autoregressive decoding, KV cache, and continuous batching.

0 favorites 0 likes
#continuous-batching

[OSS] dlmserve - first serving engine for diffusion language models

Reddit r/LocalLLaMA · 2026-05-26

dlmserve is the first open-source serving engine for diffusion language models, providing an OpenAI-compatible API, continuous batching, and 2.5x throughput over Hugging Face, all within 12GB VRAM.

0 favorites 0 likes
#continuous-batching

@jundotkim: oMLX 0.3.9rc1 released. Highlights: - Low-memory Macs stay stable instead of getting killed by the OS - DFlash bumped t…

X AI KOLs Timeline · 2026-05-19 Cached

oMLX 0.3.9rc1, an LLM inference server optimized for Apple Silicon Macs, adds low-memory stability, chunked prefill, multi-tasking admin chat, and more.

0 favorites 0 likes
#continuous-batching

Unlocking asynchronicity in continuous batching

Hugging Face Blog · 2026-05-14 Cached

This article explains how to implement asynchronous continuous batching for LLM inference, overlapping CPU batch preparation with GPU computation to maximize utilization and reduce idle time.

0 favorites 0 likes
← Back to home

Submit Feedback