compute-efficiency

#compute-efficiency

CHERRY: Compressed Hierarchical Experts with Recurrent Representational Yield

arXiv cs.CL ↗ · 4d ago Cached

This paper introduces CHERRY, a set of techniques for compute-efficient language models including selective token supervision, depth compression via recurrent unrolling, and a mixture of compressed experts, achieving significant efficiency gains on a Korean foundation model.

0 favorites 0 likes

#compute-efficiency

Internal Data Repetition Destroys Language Models

arXiv cs.LG ↗ · 2026-06-25 Cached

This paper systematically studies the damage caused by exact document repetition during language model pretraining, showing that repeating a moderately sized subset a moderate number of times maximally harms performance, and that repetition can waste up to 33% of compute (as measured by compute-equivalent loss).

0 favorites 0 likes

#compute-efficiency

ZeroGPU

Product Hunt ↗ · 2026-06-05

ZeroGPU is a compute efficient layer designed for AI inference, aiming to optimize GPU usage and reduce costs.

0 favorites 0 likes

#compute-efficiency

LayerRoute: Input-Conditioned Adaptive Layer Skipping via LoRA Fine-Tuning for Agentic Language Models

Hugging Face Daily Papers ↗ · 2026-06-01 Cached

LayerRoute is a lightweight adapter that selectively skips transformer blocks during inference based on input type, achieving compute savings while maintaining or improving model quality through gated routing and LoRA adaptation. It achieves a 12.91% skip differential on agentic language models.

0 favorites 0 likes

#compute-efficiency

LVSA: Training-Free Sparse Attention for Long Video Diffusion

Hugging Face Daily Papers ↗ · 2026-05-29 Cached

LVSA introduces a training-free sparse attention mechanism for video diffusion models, reducing compute up to 3.17x while enabling generation beyond training horizons without quality loss.

0 favorites 0 likes

#compute-efficiency

Learn-by-Wire Training Control Governance: Bounded Autonomous Training Under Stress for Stability and Efficiency

arXiv cs.AI ↗ · 2026-05-20 Cached

This paper introduces LBW-Guard, a bounded autonomous training control governance layer that operates above the AdamW optimizer to monitor telemetry and apply bounded control during training, demonstrating improved perplexity and training speed under stress conditions.

0 favorites 0 likes

#compute-efficiency

When AI rendered video is ready, it will be wildly more compute efficient than the >1 million+ render hours of a movie like Big Hero 6

Reddit r/singularity ↗ · 2026-05-16

Discusses the potential for AI-rendered video to be far more compute-efficient than traditional rendering, using Big Hero 6's millions of render hours as a benchmark.

0 favorites 0 likes

#compute-efficiency

Dynamically allocating compute budget to hard set of problems and evolving the sections with Qwen-35B-A3B gets you near GPT-5.4-xHigh on HLE

Reddit r/LocalLLaMA ↗ · 2026-05-15

A method that dynamically allocates compute budget to hard problems using Qwen-35B-A3B achieves performance near GPT-5.4-xHigh on the HLE benchmark.

0 favorites 0 likes

#compute-efficiency

prompt caching, but for rl training - 7.5x speedup on long-prompt/short-response workloads

Reddit r/LocalLLaMA ↗ · 2026-05-11

A new optimization technique for open-source RL training engines introduces prompt caching during training, achieving up to 7.5x speedup on long-prompt, short-response workloads by reducing redundant compute.

0 favorites 0 likes

#compute-efficiency

Scaling laws for neural language models

OpenAI Blog ↗ · 2020-01-23 Cached

Foundational empirical study demonstrating power-law scaling relationships between language model performance and model size, dataset size, and compute budget, with implications for optimal training allocation and sample efficiency.

0 favorites 0 likes

compute-efficiency

Submit Feedback