Tag
The article explores reinforcement learning fine-tuning of small (4B) recursive language models (RLMs) to perform evidence selection from scientific documents, showing that RL-trained 4B models match Claude Sonnet 4.6 performance at a fraction of the size and cost.
AdaGATE is a training-free evidence controller for multi-hop RAG that uses entity-centric gap tracking, micro-query generation, and utility-based selection to improve robustness under noisy retrieval, achieving state-of-the-art evidence F1 with fewer input tokens.