prefix-caching

Tag

Cards List
#prefix-caching

@QingQ77: A terminal AI coding agent designed specifically for DeepSeek API prefix caching mechanism, maintaining ultra-low token costs in long sessions through a cache-first architecture. https://github.com/esengine/DeepSeek-Reasonix… Reaso…

X AI KOLs Timeline · 14h ago Cached

Reasonix is a terminal AI coding agent designed specifically for DeepSeek API prefix caching mechanism, achieving ultra-low token costs in long sessions through a cache-first architecture. In testing, 435 million input tokens cost only about $12, with a cache hit rate of 99.82%.

0 favorites 0 likes
#prefix-caching

Sparse Prefix Caching for Hybrid and Recurrent LLM Serving

arXiv cs.LG · yesterday Cached

This paper introduces sparse prefix caching for hybrid and recurrent LLMs, which stores recurrent states at a limited set of checkpoint positions to avoid dense caching while minimizing recomputation. The method outperforms standard heuristics on real-world data, especially when requests share substantial but non-identical prefixes.

0 favorites 1 likes
← Back to home

Submit Feedback