Tag
This paper presents EMA, a model adaptation system for learning-based systems that reduces training and labeling costs while improving system performance in evolving environments.
This paper introduces DisagMoE, a system for MoE training that optimizes computation-communication overlap by disaggregating attention and FFN layers across GPU groups. Implemented on Megatron-LM, it achieves up to 1.8x speedup on H800 clusters by addressing inter-node communication bottlenecks.
NVIDIA and Unsloth have published a technical guide detailing three low-level optimizations that can accelerate LLM fine-tuning by up to 25%, including packed-sequence caching, double-buffered checkpointing, and optimized MoE routing. The guide provides deep systems-level explanations and benchmarks aimed at ML engineers and developers.
The author seeks alternatives to Oracle Cloud for hosting a 24/7 OpenClaw instance on an 'Always Free' tier, discussing options like Google Cloud e2-micro and Fly.io, and asking for optimization tips to run within 1GB RAM.
Lightseek releases TokenSpeed, a high-performance LLM inference engine optimized for agentic workloads, featuring compiler-backed parallelism and advanced kernel optimizations that have been adopted by vLLM.
AReaL is a fully asynchronous reinforcement learning system for LLM reasoning, achieving up to 2.57x training speedup over synchronous systems while maintaining or improving performance. It decouples generation and training to improve GPU utilization and includes optimizations like staleness-enhanced PPO.
GTweak is an open-source Windows system optimization and privacy tool that allows users to disable telemetry, updates, and unnecessary services while activating Windows.