Stratum: System-Hardware Co-Design with 3D-Stackable DRAM for Efficient Moe

Hacker News Top Papers

Summary

Introduces Stratum, a system-hardware co-design approach utilizing 3D-stackable DRAM to efficiently accelerate Mixture of Experts (MoE) models.

No content available
Original Article

Similar Articles

Are the rich RAM /poor GPU people wrong here?

Reddit r/LocalLLaMA

Discusses the trade-off between dense and Mixture-of-Experts (MoE) models for local AI, noting that high-RAM users have limited MoE options beyond Qwen 3.5 122B, and questioning if large GPU is the only viable path.

Multi Tier MoE Caching

Reddit r/LocalLLaMA

Discusses multi-tier caching strategies for MoE models to improve inference speed by keeping frequently activated experts on GPU, referencing existing implementations like PowerInfer and llama.cpp branches.

An Efficient Heterogeneous Co-Design for Fine-Tuning on a Single GPU

Papers with Code Trending

SlideFormer introduces a heterogeneous co-design for full-parameter LLM fine-tuning on a single GPU, leveraging GPU/CPU/RAM/NVMe with a layer-sliding engine and optimized Triton kernels, enabling fine-tuning of 123B+ models on a single RTX 4090 with significant throughput improvements.