expert-caching

Tag

Cards List
#expert-caching

Multi Tier MoE Caching

Reddit r/LocalLLaMA · yesterday

Discusses multi-tier caching strategies for MoE models to improve inference speed by keeping frequently activated experts on GPU, referencing existing implementations like PowerInfer and llama.cpp branches.

0 favorites 0 likes
← Back to home

Submit Feedback