Tag
Proposes building an open-source, lightweight semantic cache for LLMs using Rust/WASM at the CDN edge to reduce latency and API costs, seeking community feedback on architecture and use-case validity.
This paper proposes Semantic Cache Distillation (SCD), a loss-constrained framework that replaces raw KV cache transmission with compact semantic codes, achieving up to 2.65x TTFT speedup while keeping generation quality within 5% F1 of the oracle.