long-horizon-agents

#long-horizon-agents

Proactive Memory for Long-Horizon Agents (16 minute read)

TLDR AI ↗ · yesterday Cached

This paper introduces a proactive memory agent that operates alongside a standard action agent to selectively inject memory-grounded reminders during long-horizon tasks, mitigating behavioral state decay. Experiments on Terminal-Bench and τ²-Bench show significant improvements in pass@1, and the approach is demonstrated with both weak and strong action agents.

0 favorites 0 likes

#long-horizon-agents

From Noisy Traces to Root Causes: Structural Trajectory Analysis and Causal Extraction for Agent Optimization

arXiv cs.CL ↗ · 5d ago Cached

Introduces STRACE, a framework that performs structural trajectory analysis and causal extraction to construct high signal-to-noise optimization contexts for improving long-horizon agents, outperforming baselines on a formal verification task.

0 favorites 0 likes

#long-horizon-agents

Ask the World Before Acting: Budgeted Environment Probing for World-Model Calibration

arXiv cs.AI ↗ · 2026-07-01 Cached

The paper introduces EnvProbe, a budgeted environment probing operator that allows long-horizon language agents to selectively query the environment for specific belief fields before acting, reducing world-model error by efficiently calibrating beliefs with limited interactions.

0 favorites 0 likes

#long-horizon-agents

AgentOdyssey: Open-Ended Long-Horizon Text Game Generation for Test-Time Continual Learning Agents

arXiv cs.CL ↗ · 2026-06-25 Cached

Introduces AgentOdyssey, a procedural text game generation framework designed to evaluate agents on test-time continual learning abilities including exploration, episodic memory, world knowledge acquisition, skill learning, and long-horizon planning. The framework highlights significant gaps between current agents and human performance.

0 favorites 0 likes

#long-horizon-agents

Beyond Compaction: Structured Context Eviction for Long-Horizon Agents

arXiv cs.CL ↗ · 2026-06-11 Cached

Introduces Context Window Lifecycle (CWL), a structured context eviction scheme for long-horizon LLM agents that maintains an effectively unbounded working horizon by evicting content based on a dependency graph, avoiding the limitations of summarization-based compaction and recency truncation.

0 favorites 0 likes

#long-horizon-agents

@dair_ai: Outstanding paper on long-horizon agents. (bookmark it) Similar to humans, how do you make agents persist on a difficul…

X AI KOLs Following ↗ · 2026-06-04 Cached

AutoLab is a new benchmark evaluating 17 frontier models on 36 expert-curated long-horizon tasks (system optimization, model development, CUDA kernels, puzzles), finding that persistence—not initial attempt quality—is the dominant predictor of success. Claude-opus-4.6 led all categories, while most other models terminated prematurely or exhausted budgets with minimal progress.

0 favorites 0 likes

#long-horizon-agents

Temporal Order Matters for Agentic Memory: Segment Trees for Long-Horizon Agents

arXiv cs.CL ↗ · 2026-06-04 Cached

Researchers from University of Toronto and Vector Institute propose Segment Tree Memory (SegTreeMem), a memory architecture for long-horizon conversational agents that preserves temporal order using a hierarchical segment tree structure for both online construction and retrieval. Experiments across three datasets show nearly 20% improvement in LLM-judge accuracy over non-temporal tree baselines.

0 favorites 0 likes

#long-horizon-agents

S3Mem: Structured Spatiotemporal Scene-Event Memory for Long-Horizon Interactive Question Answering

arXiv cs.CL ↗ · 2026-05-29 Cached

S3Mem proposes a structured spatiotemporal scene-event memory framework for long-horizon interactive question answering, using anchor-sensitive retrieval and token-budget-aware evidence interface to outperform standard RAG in multiple environments.

0 favorites 0 likes

#long-horizon-agents

@_akhaliq: LongMINT Evaluating Memory under Multi-Target Interference in Long-Horizon Agent Systems

X AI KOLs Following ↗ · 2026-05-21 Cached

LongMINT is a benchmark for evaluating memory under multi-target interference in long-horizon agent systems.

0 favorites 0 likes

#long-horizon-agents

@blc_16: If you want to understand why RL struggles with long-horizon agent tasks, this is a good explanation. The core issue is…

X AI KOLs Timeline ↗ · 2026-05-10

The post explains why Reinforcement Learning struggles with long-horizon tasks due to sparse rewards and highlights GEPA, a method that uses trajectory-level textual reflection to preserve richer feedback signals for optimization.

0 favorites 0 likes

#long-horizon-agents

Memanto: Typed Semantic Memory with Information-Theoretic Retrieval for Long-Horizon Agents

Papers with Code Trending ↗ · 2026-04-23 Cached

Memanto introduces a typed semantic memory system using a schema, conflict resolution, and Moorcheh's information-theoretic retrieval engine, achieving state-of-the-art results on LongMemEval and LoCoMo benchmarks with zero ingestion cost and sub-90ms latency.

0 favorites 0 likes

#long-horizon-agents

@omarsar0: Pay attention to this one, AI devs. This is particularly interesting if you work with long-horizon terminal agents that…

X AI KOLs Following ↗ · 2026-04-22 Cached

TACO is a self-evolving framework that automatically discovers and refines context compression rules for long-horizon terminal agents.

0 favorites 0 likes

long-horizon-agents

Submit Feedback