Tag
Discusses the trade-off between dense and Mixture-of-Experts (MoE) models for local AI, noting that high-RAM users have limited MoE options beyond Qwen 3.5 122B, and questioning if large GPU is the only viable path.
IceCache introduces Dynamic Continuous Indexing to keep GPU memory usage constant during long LLM generations with minimal accuracy loss.