@a1zhang: RLM arXiv paper update: depth>1 results, more comparisons, more training, and more error analysis! We add depth=2/3 exp…
Summary
This update to the RLM arXiv paper adds depth>1 experiments with recursive RLM calls, showing significant performance gains on OOLONG-Pairs and other benchmarks, along with new comparisons to OpenCode and Claude Code, additional training results on MRCRv2, and an expanded error analysis.
Similar Articles
@TDataScience: Follow along @neural_avb's all-in-one deep dive to learn "what recursive language models (RLMs) are, why they are winni…
An educational deep dive into recursive language models (RLMs), explaining what they are, why they are winning long-context benchmarks, and how they differ from existing agentic harness designs like ReAct or CodeAct, using a simple case study.
Reinforcing Recursive Language Models (18 minute read)
The article explores reinforcement learning fine-tuning of small (4B) recursive language models (RLMs) to perform evidence selection from scientific documents, showing that RL-trained 4B models match Claude Sonnet 4.6 performance at a fraction of the size and cost.
Developing open source LLM from ground up from pretrain - rlhf(PPO/GRPO)
A developer shares progress on training a 7B parameter open source LLM from scratch using a DeepSeek architecture optimized for low VRAM, with the goal of democratizing AI development and eventually surpassing large proprietary models.
DeepSeek V4 paper full version is out, FP4 QAT details and stability tricks [D]
DeepSeek released the full V4 paper detailing FP4 quantization-aware training, MoE training stability tricks (anticipatory routing and SwiGLU clamping), and a generative reward model for RLHF, achieving dramatic efficiency gains—V4-Flash uses only 10% of V3.2's FLOPs and 7% of its KV cache at 1M context length.
We benchmarked 18 LLMs on OCR (7k+ calls) — cheaper/old models oftentimes win. Full dataset + framework open-sourced. [R]
A comprehensive benchmark of 18 LLMs on OCR tasks (7k+ calls) reveals that cheaper and older models often match premium accuracy at a fraction of the cost, with full dataset and framework open-sourced.