Tested how long small models hold a fact across a conversation. The memory failure mode is a real problem for agents, and it's not what I expected.

Reddit r/AI_Agents News

Summary

A developer tested how small edge models (LFM2.5, Gemma variants) retain a single fact across conversation turns, finding that models often confidently deny knowing information that remains in context, posing a trust issue for agent architectures and suggesting a trade-off between memory and format discipline.

If you're building agents on small or on-device models, this one's relevant: I measured how long three edge models hold a single fact as the conversation grows, and the way they fail is worse for agents than plain forgetting. Setup was simple on purpose: inject one fact, pile on N turns of unrelated filler, ask for the fact. Three runs per depth, shuffled filler each time. The failure mode: when an agent loses the fact, it doesn't guess wrong. It asserts it could never have known, "I don't have access to your personal information." But the fact is still sitting in context. For an agent that's supposed to carry user state across a session, this means it won't just drop a constraint, it'll confidently tell the user the constraint was never given. That breaks trust and it's painful to trace, because nothing actually errored. The numbers, short version: * LFM2.5 (1.5B active MoE): longest memory, degrades gradually. * Gemma 4 E2B (\~2B): solid then a sudden cliff around 8-10 turns. * Gemma 4 E4B (\~4B): shortest memory of the three, breaks at 5 turns, but the strongest at instruction-following and keeping tool-call formats intact. That last split is the interesting tension for agent builders. The model best at not breaking your tool schema was the worst at remembering what the user said. If memory and format-discipline really do trade off, you may want one model driving structured tool calls and a separate mechanism (retrieval, refreshed system state) holding the facts, rather than expecting one small model to do both. Writeup with the full chart, per-depth breakdown, and the reproducible harness. Link in the comments below. Curious if anyone running agent frameworks has hit the "you never told me" refusal in the wild, and how you worked around it.
Original Article

Similar Articles

STALE: Can LLM Agents Know When Their Memories Are No Longer Valid?

Hugging Face Daily Papers

This paper identifies a critical failure mode in LLM agents where they fail to update personalized memories when new evidence conflicts with prior beliefs. It introduces the STALE benchmark and a three-dimensional probing framework, revealing that even the best models achieve only 55.2% accuracy, and proposes CUPMem as a prototype for robust memory revision.

Useful memories become faulty when continuously updated by LLMs (30 minute read)

TLDR AI

This research demonstrates that continuously updating LLM agent memories through distillation and consolidation loops causes performance regression, even when trained on ground-truth solutions. The study finds that episodic-only retention outperforms text-based consolidation, highlighting significant flaws in current self-improvement paradigms.