@a1zhang: RLM arXiv paper update: depth>1 results, more comparisons, more training, and more error analysis! We add depth=2/3 exp…

X AI KOLs Following 05/12/26, 03:04 PM Papers

arxiv-paper rlm recursive-language-model depth-experiments performance-gains error-analysis open-source

Summary

This update to the RLM arXiv paper adds depth>1 experiments with recursive RLM calls, showing significant performance gains on OOLONG-Pairs and other benchmarks, along with new comparisons to OpenCode and Claude Code, additional training results on MRCRv2, and an expanded error analysis.

RLM arXiv paper update: depth>1 results, more comparisons, more training, and more error analysis! We add depth=2/3 experiments, where the RLM now has access to recursive RLM calls. This is also a feature of the open source `rlm` repo as well. We observe significant performance gains on OOLONG-Pairs and gains on all other benchmarks! We also include various OpenCode and Claude Code comparisons now per popular request. We add a length generalization experiment on MRCRv2 to show more promising training results, add a small prompting case study on OOLONG, and update the error analysis section to discuss the effect of syntax errors, decomposition mistakes, and general observations from the RLM trajectories. The appendix is now also updated with several new experiments and plots!

Original Article

Similar Articles

@TDataScience: Follow along @neural_avb's all-in-one deep dive to learn "what recursive language models (RLMs) are, why they are winni…

X AI KOLs Following

An educational deep dive into recursive language models (RLMs), explaining what they are, why they are winning long-context benchmarks, and how they differ from existing agentic harness designs like ReAct or CodeAct, using a simple case study.

Reinforcing Recursive Language Models (18 minute read)

TLDR AI

The article explores reinforcement learning fine-tuning of small (4B) recursive language models (RLMs) to perform evidence selection from scientific documents, showing that RL-trained 4B models match Claude Sonnet 4.6 performance at a fraction of the size and cost.

Developing open source LLM from ground up from pretrain - rlhf(PPO/GRPO)

Reddit r/LocalLLaMA

A developer shares progress on training a 7B parameter open source LLM from scratch using a DeepSeek architecture optimized for low VRAM, with the goal of democratizing AI development and eventually surpassing large proprietary models.

DeepSeek V4 paper full version is out, FP4 QAT details and stability tricks [D]

Reddit r/MachineLearning

DeepSeek released the full V4 paper detailing FP4 quantization-aware training, MoE training stability tricks (anticipatory routing and SwiGLU clamping), and a generative reward model for RLHF, achieving dramatic efficiency gains—V4-Flash uses only 10% of V3.2's FLOPs and 7% of its KV cache at 1M context length.

We benchmarked 18 LLMs on OCR (7k+ calls) — cheaper/old models oftentimes win. Full dataset + framework open-sourced. [R]