training-optimization

#training-optimization

Mythos can improve speed of training code 52x (compared to human 4x at 4-8hrs)

Reddit r/singularity ↗ · 2026-06-04

Anthropic's Mythos system achieved a 52x speedup in optimizing training code compared to a human's 4x speedup over 4-8 hours on the same task, with the caveat that absolute multiples depend heavily on starting code quality. The like-for-like comparison shows roughly 3x–52x improvement across models over the past year.

0 favorites 0 likes

#training-optimization

Accelerating Long-Tail Generation in Synchronous RLHF Training via Adaptive Tensor Parallelism

arXiv cs.AI ↗ · 2026-05-26 Cached

This paper proposes PAT, an adaptive tensor parallelism method that dynamically reconfigures TP during the generation stage of synchronous RLHF training to mitigate long-tail generation bottlenecks. Evaluations on LLaMA3.1-8B and Qwen3-14B show reductions in generation latency by up to 34.6% and end-to-end iteration latency by up to 27.2%.

0 favorites 0 likes

#training-optimization

prompt caching, but for rl training - 7.5x speedup on long-prompt/short-response workloads

Reddit r/LocalLLaMA ↗ · 2026-05-11

A new optimization technique for open-source RL training engines introduces prompt caching during training, achieving up to 7.5x speedup on long-prompt, short-response workloads by reducing redundant compute.

0 favorites 0 likes

#training-optimization

PRX Part 3 — Training a Text-to-Image Model in 24h!

Hugging Face Blog ↗ · 2026-03-03 Cached

Photoroom's PRX Part 3 demonstrates training a text-to-image model in 24 hours by combining optimized architectural and training techniques including perceptual losses, token routing with TREAD, and the Muon optimizer.

0 favorites 0 likes

training-optimization

Mythos can improve speed of training code 52x (compared to human 4x at 4-8hrs)

Accelerating Long-Tail Generation in Synchronous RLHF Training via Adaptive Tensor Parallelism

prompt caching, but for rl training - 7.5x speedup on long-prompt/short-response workloads

PRX Part 3 — Training a Text-to-Image Model in 24h!

Submit Feedback