compute-efficiency

#compute-efficiency

Dynamically allocating compute budget to hard set of problems and evolving the sections with Qwen-35B-A3B gets you near GPT-5.4-xHigh on HLE

Reddit r/LocalLLaMA ↗ · 5h ago

A method that dynamically allocates compute budget to hard problems using Qwen-35B-A3B achieves performance near GPT-5.4-xHigh on the HLE benchmark.

0 favorites 0 likes

#compute-efficiency

prompt caching, but for rl training - 7.5x speedup on long-prompt/short-response workloads

Reddit r/LocalLLaMA ↗ · 4d ago

A new optimization technique for open-source RL training engines introduces prompt caching during training, achieving up to 7.5x speedup on long-prompt, short-response workloads by reducing redundant compute.

0 favorites 0 likes

#compute-efficiency

Scaling laws for neural language models

OpenAI Blog ↗ · 2020-01-23 Cached

Foundational empirical study demonstrating power-law scaling relationships between language model performance and model size, dataset size, and compute budget, with implications for optimal training allocation and sample efficiency.

0 favorites 0 likes

compute-efficiency

Dynamically allocating compute budget to hard set of problems and evolving the sections with Qwen-35B-A3B gets you near GPT-5.4-xHigh on HLE

prompt caching, but for rl training - 7.5x speedup on long-prompt/short-response workloads

Scaling laws for neural language models

Submit Feedback