Tag
Proposes a cognitively grounded multi-factor value function for agentic memory in LLM agents, learning interpretable weights to decide what to encode, forget, and retrieve under memory constraints. Improves gold-evidence retention significantly over similarity-only or recency-based baselines.
M^3Eval is a comprehensive evaluation framework and benchmark for probing memory capabilities in multi-modal models, grounded in cognitive psychology. Experiments reveal consistent weaknesses in memory maintenance, interference patterns, and spatial-temporal grounding.
A research paper shows that while AI can solve CAPTCHAs as well as humans, behavioral differences in interaction patterns can still reliably distinguish bots from people, leading to the proposal of a 'Process Turing Test'.
Experimental study shows inserting first/second-person pronouns into headlines has mixed effects on human memorability and that LLMs often produce inaccurate or unnatural revisions.