Joint Agent Memory and Exploration Learning via Novelty Signals
Summary
This paper introduces JAMEL, a framework that jointly trains agentic memory and exploration policies using novelty signals, enabling efficient exploration in open-ended environments with reduced computational costs.
View Cached Full Text
Cached at: 06/02/26, 03:37 PM
Paper page - Joint Agent Memory and Exploration Learning via Novelty Signals
Source: https://huggingface.co/papers/2606.01528
Abstract
Joint Agent Memory and Exploration Learning (JAMEL) framework trains memory and exploration policies together through novelty-driven interaction, enabling effective exploration in open-ended environments with reduced computational costs.
Inopen-ended environments, exploration is fundamental for autonomous agents, yet current language model agents struggle with this. Effective exploration requires memory, but retaining raw interaction histories is computationally expensive over long trajectories. Whilelatent memoryoffers a solution to compress interaction histories, its training lacks reliable supervisory signals. We introduce JointAgent Memoryand Exploration Learning (JAMEL), a framework that trains agentic memory andexploration policytogether throughnovelty-driven interaction. We observe that memory and exploration form a mutually dependent loop: sustained exploration requires memory to distinguish exhausted behaviors from unseen ones, while novelty-seeking interaction provides the supervision needed to make memory useful for future exploration. By utilizing deterministic andpersistent novelty signalssuch ascode coveragein the GUI domain, we provide natural, annotation-free supervision for the memory module. Empirical evaluations demonstrate that \ours successfully generalizes to unseen environments. Its exploration capability outperforms open-weight baselines and rivals the exploration depth of aclosed-source modelwhile reducingtoken consumption. Our code and model are open-sourced at https://github.com/MobileLLM/JAMEL.
View arXiv pageView PDFProject pageGitHub3Add to collection
Get this paper in your agent:
hf papers read 2606\.01528
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2606.01528 in a model README.md to link it from this page.
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2606.01528 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2606.01528 in a Space README.md to link it from this page.
Collections including this paper0
No Collection including this paper
Add this paper to acollectionto link it from this page.
Similar Articles
Learning to Explore: Scaling Agentic Reasoning via Exploration-Aware Policy Optimization
This paper proposes an exploration-aware reinforcement learning framework that enables LLM agents to adaptively explore only when uncertainty is high, improving performance on text-based and GUI-based benchmarks.
AEM: Adaptive Entropy Modulation for Multi-Turn Agentic Reinforcement Learning
This paper introduces AEM, a supervision-free method for agentic reinforcement learning that adapts entropy dynamics at the response level to improve exploration-exploitation trade-offs. It demonstrates performance gains on benchmarks like ALFWorld and SWE-bench by aligning uncertainty estimation with action granularity.
Some considerations on learning to explore via meta-reinforcement learning
OpenAI researchers introduce E-MAML and E-RL², two meta-reinforcement learning algorithms designed to improve exploration in tasks where discovering optimal policies requires significant exploration. The work demonstrates these algorithms' effectiveness on novel environments including Krazy World and maze tasks.
Look Before You Leap: Autonomous Exploration for LLM Agents
This paper identifies autonomous exploration as a critical capability for LLM agents and proposes the Explore-then-Act paradigm, which decouples information gathering from task execution to improve adaptability and real-world performance. It also introduces Exploration Checkpoint Coverage as a verifiable metric for evaluating exploration breadth.
Learning to Learn from Multimodal Experience
This paper introduces AutoMMemo, a framework that enables multimodal agents to automatically design memory mechanisms (expressible as executable memo programs) for learning from multimodal interaction trajectories, outperforming no-memory and fixed-memory baselines on GUI/Web navigation and visual reasoning benchmarks.