Tag
This paper introduces CHERRY, a set of techniques for compute-efficient language models including selective token supervision, depth compression via recurrent unrolling, and a mixture of compressed experts, achieving significant efficiency gains on a Korean foundation model.
This paper systematically studies the damage caused by exact document repetition during language model pretraining, showing that repeating a moderately sized subset a moderate number of times maximally harms performance, and that repetition can waste up to 33% of compute (as measured by compute-equivalent loss).
ZeroGPU is a compute efficient layer designed for AI inference, aiming to optimize GPU usage and reduce costs.
LayerRoute is a lightweight adapter that selectively skips transformer blocks during inference based on input type, achieving compute savings while maintaining or improving model quality through gated routing and LoRA adaptation. It achieves a 12.91% skip differential on agentic language models.
LVSA introduces a training-free sparse attention mechanism for video diffusion models, reducing compute up to 3.17x while enabling generation beyond training horizons without quality loss.
This paper introduces LBW-Guard, a bounded autonomous training control governance layer that operates above the AdamW optimizer to monitor telemetry and apply bounded control during training, demonstrating improved perplexity and training speed under stress conditions.
Discusses the potential for AI-rendered video to be far more compute-efficient than traditional rendering, using Big Hero 6's millions of render hours as a benchmark.
A method that dynamically allocates compute budget to hard problems using Qwen-35B-A3B achieves performance near GPT-5.4-xHigh on the HLE benchmark.
A new optimization technique for open-source RL training engines introduces prompt caching during training, achieving up to 7.5x speedup on long-prompt, short-response workloads by reducing redundant compute.
Foundational empirical study demonstrating power-law scaling relationships between language model performance and model size, dataset size, and compute budget, with implications for optimal training allocation and sample efficiency.