Context-Aware RL for Agentic and Multimodal LLMs

Hugging Face Daily Papers Papers

Summary

Introduces ContextRL, a reinforcement learning approach that teaches LLMs to identify which context supports an answer, achieving gains on agentic and multimodal benchmarks.

Large language models (LLMs) often fail when answering requires identifying a small but decisive piece of evidence within a long or complex context, such as a single line in a tool trace or a subtle detail in an image. We propose ContextRL, a context-aware reinforcement learning (RL) method that improves long-horizon reasoning and multimodal performance through an indirect auxiliary objective. Instead of supervising only the final answer, ContextRL presents the model with a query, an answer, and two highly similar contexts, and rewards it for selecting the context that supports the query--answer pair, thereby encouraging fine-grained grounding. We construct contrastive context data in two domains: for coding agents, trajectories serve as contexts, yielding 1k pairs built via condition filtering; for multimodal reasoning, images serve as contexts, yielding 7K pairs built via generative editing and similarity search. ContextRL achieves average gains of +2.2% over standard GRPO on 5 long-horizon benchmarks, and +1.8% across 12 diverse visual question answering benchmarks. To disentangle the effect of the proposed objective from that of additional data, we compare against data-augmentation baselines that repurpose the same contrastive contexts as standard query--context--answer examples. These baselines provide little to no improvement, showing that the gains arise from the proposed context-selection objective rather than from the contrastive data alone.
Original Article
View Cached Full Text

Cached at: 06/20/26, 02:26 PM

Paper page - Context-Aware RL for Agentic and Multimodal LLMs

Source: https://huggingface.co/papers/2606.17053 Context-Aware RL for Agentic and Multimodal LLMs

👉 LLMs often fail not because the answer is impossible, but because they miss the one decisive clue hidden in a long trace or image.

🔥 We introduce ContextRL: RL that teaches models to identify which context actually supports an answer.

✅ +2.2% on 5 agentic benchmarks ✅ +1.8% across 12 VQA benchmarks ✅ Works for coding agents & multimodal reasoning ✅ Same contrastive data, but better objective — not data augmentation

🧠 The key idea: don’t only reward the final answer. Reward the model for grounding it in the right evidence.

Similar Articles

Learning Agent-Compatible Context Management for Long-Horizon Tasks

arXiv cs.AI

Introduces AdaCoM, an external LLM-based context manager for frozen agents, using reinforcement learning to improve long-horizon task performance by preserving task constraints and pruning stale content, with experiments on web search and deep research benchmarks.

From History to State: Constant-Context Skill Learning for LLM Agents

arXiv cs.AI

This paper introduces 'constant-context skill learning,' a framework that moves procedural knowledge from prompts into model weights to reduce token usage and improve privacy for LLM agents. The method achieves strong performance on benchmarks like ALFWorld and WebShop while significantly reducing inference costs.