commonsense-reasoning

#commonsense-reasoning

STALE: Can LLM Agents Know When Their Memories Are No Longer Valid?

Hugging Face Daily Papers ↗ · 2026-05-07 Cached

This paper identifies a critical failure mode in LLM agents where they fail to update personalized memories when new evidence conflicts with prior beliefs. It introduces the STALE benchmark and a three-dimensional probing framework, revealing that even the best models achieve only 55.2% accuracy, and proposes CUPMem as a prototype for robust memory revision.

0 favorites 0 likes

commonsense-reasoning

STALE: Can LLM Agents Know When Their Memories Are No Longer Valid?

Submit Feedback