Tag
This paper introduces AutoMMemo, a framework that enables multimodal agents to automatically design memory mechanisms (expressible as executable memo programs) for learning from multimodal interaction trajectories, outperforming no-memory and fixed-memory baselines on GUI/Web navigation and visual reasoning benchmarks.
The paper introduces δ-mem, a lightweight memory mechanism that enhances large language models by augmenting a frozen attention backbone with a compact associative memory state. It demonstrates improved performance on memory-heavy benchmarks with minimal computational overhead.