Tag
Zhipu AI founder Tang Jie predicts that the biggest breakthrough in large models this year will be long-horizon tasks, where AI can continuously solve complex problems in real environments, and mentions three technical pillars and Anthropic's progress in autonomous training.
This paper introduces Agent-BRACE, a method that decouples LLM agents into belief state and policy models to handle long-horizon tasks in partially observable environments. By verbalizing state uncertainty, it achieves significant performance improvements over baselines while maintaining constant context window size.
The article discusses the anticipated breakthrough in long-horizon AI tasks and autonomous agents, suggesting a shift from 'one-person' to 'none-person' companies. It highlights technical pillars like memory, continual learning, and self-judging as key to realizing fully self-evolving AI systems that could redefine AGI and operating systems.
This paper introduces ReFlect, a training-free harness system that wraps LLMs with deterministic error detection and recovery logic to improve performance on complex, long-horizon reasoning tasks.
This paper introduces BEACON, a milestone-guided policy learning framework designed to improve credit assignment and sample efficiency for long-horizon language agents. It demonstrates significant performance improvements over GRPO and GiGPO on benchmarks like ALFWorld, WebShop, and ScienceWorld.
FS-Researcher introduces a file-system-based dual-agent framework that enables LLM agents to conduct deep research beyond context window limits by using persistent external memory as a shared workspace. The framework achieves state-of-the-art results on research benchmarks and demonstrates effective test-time scaling through computation allocation to evidence collection.