Tag
LongMINT is a benchmark for evaluating memory under multi-target interference in long-horizon agent systems.
The MEME benchmark evaluates AI memory systems across multiple entities and evolving conditions, revealing significant challenges in dependency reasoning that persist even with advanced retrieval techniques.