Tag
This paper introduces EP-HUBO, a quantum-inspired method that treats evidence selection in chain-of-thought reasoning as a combinatorial optimization problem, significantly improving performance on legal reasoning benchmarks like MMLU-Pro law and LEXam by allowing minority-but-correct hypotheses to override noisy majorities.
The article explores reinforcement learning fine-tuning of small (4B) recursive language models (RLMs) to perform evidence selection from scientific documents, showing that RL-trained 4B models match Claude Sonnet 4.6 performance at a fraction of the size and cost.
AdaGATE is a training-free evidence controller for multi-hop RAG that uses entity-centric gap tracking, micro-query generation, and utility-based selection to improve robustness under noisy retrieval, achieving state-of-the-art evidence F1 with fewer input tokens.