Tag
Researchers propose Prefill-as-a-Service (PrfaaS), a system that offloads long-context prefill to remote compute-dense clusters and streams KVCache over commodity Ethernet, enabling independent scaling and 32-54% higher throughput for a 1T-parameter hybrid model.