@a1zhang: RLM arXiv paper update: depth>1 results, more comparisons, more training, and more error analysis! We add depth=2/3 exp…

X AI KOLs Following Papers

Summary

This update to the RLM arXiv paper adds depth>1 experiments with recursive RLM calls, showing significant performance gains on OOLONG-Pairs and other benchmarks, along with new comparisons to OpenCode and Claude Code, additional training results on MRCRv2, and an expanded error analysis.

RLM arXiv paper update: depth>1 results, more comparisons, more training, and more error analysis! We add depth=2/3 experiments, where the RLM now has access to recursive RLM calls. This is also a feature of the open source `rlm` repo as well. We observe significant performance gains on OOLONG-Pairs and gains on all other benchmarks! We also include various OpenCode and Claude Code comparisons now per popular request. We add a length generalization experiment on MRCRv2 to show more promising training results, add a small prompting case study on OOLONG, and update the error analysis section to discuss the effect of syntax errors, decomposition mistakes, and general observations from the RLM trajectories. The appendix is now also updated with several new experiments and plots!
Original Article

Similar Articles

Reinforcing Recursive Language Models (18 minute read)

TLDR AI

The article explores reinforcement learning fine-tuning of small (4B) recursive language models (RLMs) to perform evidence selection from scientific documents, showing that RL-trained 4B models match Claude Sonnet 4.6 performance at a fraction of the size and cost.

DeepSeek V4 paper full version is out, FP4 QAT details and stability tricks [D]

Reddit r/MachineLearning

DeepSeek released the full V4 paper detailing FP4 quantization-aware training, MoE training stability tricks (anticipatory routing and SwiGLU clamping), and a generative reward model for RLHF, achieving dramatic efficiency gains—V4-Flash uses only 10% of V3.2's FLOPs and 7% of its KV cache at 1M context length.