Context-Aware RL for Agentic and Multimodal LLMs

Hugging Face Daily Papers 06/15/26, 12:00 AM Papers

reinforcement-learning context-aware agentic multimodal large-language-models reasoning

Summary

Introduces ContextRL, a reinforcement learning approach that teaches LLMs to identify which context supports an answer, achieving gains on agentic and multimodal benchmarks.

Large language models (LLMs) often fail when answering requires identifying a small but decisive piece of evidence within a long or complex context, such as a single line in a tool trace or a subtle detail in an image. We propose ContextRL, a context-aware reinforcement learning (RL) method that improves long-horizon reasoning and multimodal performance through an indirect auxiliary objective. Instead of supervising only the final answer, ContextRL presents the model with a query, an answer, and two highly similar contexts, and rewards it for selecting the context that supports the query--answer pair, thereby encouraging fine-grained grounding. We construct contrastive context data in two domains: for coding agents, trajectories serve as contexts, yielding 1k pairs built via condition filtering; for multimodal reasoning, images serve as contexts, yielding 7K pairs built via generative editing and similarity search. ContextRL achieves average gains of +2.2% over standard GRPO on 5 long-horizon benchmarks, and +1.8% across 12 diverse visual question answering benchmarks. To disentangle the effect of the proposed objective from that of additional data, we compare against data-augmentation baselines that repurpose the same contrastive contexts as standard query--context--answer examples. These baselines provide little to no improvement, showing that the gains arise from the proposed context-selection objective rather than from the contrastive data alone.

Original Article

View Cached Full Text

Cached at: 06/20/26, 02:26 PM

Paper page - Context-Aware RL for Agentic and Multimodal LLMs

Source: https://huggingface.co/papers/2606.17053 Context-Aware RL for Agentic and Multimodal LLMs

👉 LLMs often fail not because the answer is impossible, but because they miss the one decisive clue hidden in a long trace or image.

🔥 We introduce ContextRL: RL that teaches models to identify which context actually supports an answer.

✅ +2.2% on 5 agentic benchmarks ✅ +1.8% across 12 VQA benchmarks ✅ Works for coding agents & multimodal reasoning ✅ Same contrastive data, but better objective — not data augmentation

🧠 The key idea: don’t only reward the final answer. Reward the model for grounding it in the right evidence.

Context-Aware RL for Agentic and Multimodal LLMs

Paper page - Context-Aware RL for Agentic and Multimodal LLMs

Similar Articles

Learning Agent-Compatible Context Management for Long-Horizon Tasks

GoLongRL: Capability-Oriented Long Context Reinforcement Learning with Multitask Alignment

From History to State: Constant-Context Skill Learning for LLM Agents

From Trainee to Trainer: LLM-Designed Training Environment for RL with Multi-Agent Reasoning

Decision-Aware Memory Cards: Counterfactual-Inspired Context Selection and Compression for Tool-Using LLM Agents

Submit Feedback

Similar Articles

Learning Agent-Compatible Context Management for Long-Horizon Tasks

GoLongRL: Capability-Oriented Long Context Reinforcement Learning with Multitask Alignment

From History to State: Constant-Context Skill Learning for LLM Agents

From Trainee to Trainer: LLM-Designed Training Environment for RL with Multi-Agent Reasoning

Decision-Aware Memory Cards: Counterfactual-Inspired Context Selection and Compression for Tool-Using LLM Agents