Tag
This paper presents EMA, a model adaptation system for learning-based systems that reduces training and labeling costs while improving system performance in evolving environments.
This paper introduces DisagMoE, a system for MoE training that optimizes computation-communication overlap by disaggregating attention and FFN layers across GPU groups. Implemented on Megatron-LM, it achieves up to 1.8x speedup on H800 clusters by addressing inter-node communication bottlenecks.
NVIDIA and Unsloth have published a technical guide detailing three low-level optimizations that can accelerate LLM fine-tuning by up to 25%, including packed-sequence caching, double-buffered checkpointing, and optimized MoE routing. The guide provides deep systems-level explanations and benchmarks aimed at ML engineers and developers.
The author seeks alternatives to Oracle Cloud for hosting a 24/7 OpenClaw instance on an 'Always Free' tier, discussing options like Google Cloud e2-micro and Fly.io, and asking for optimization tips to run within 1GB RAM.
Lightseek releases TokenSpeed, a high-performance LLM inference engine optimized for agentic workloads, featuring compiler-backed parallelism and advanced kernel optimizations that have been adopted by vLLM.
GTweak is an open-source Windows system optimization and privacy tool that allows users to disable telemetry, updates, and unnecessary services while activating Windows.