Challenges with long-term memory and reliability in personal agents

Reddit r/ArtificialInteligence News

Summary

The author shares challenges in building a personal health agent for ongoing use, focusing on long-term memory management and reliability issues including hallucination when synthesizing data from multiple sources over time.

I’ve been building Kim a personal health agent meant for ongoing use. Instead of one-off queries, the goal is to let it answer questions and surface insights from a user’s health data over time (wearables, labs, symptoms, habits, etc.). Two challenges have been especially difficult: 1. Long-term memory management: Maintaining useful context across weeks or months is hard. Simple vector retrieval starts to degrade with months of personal data. I’ve been experimenting with what to persist, how to summarize or forget older information, and how to handle conflicting signals across data sources. Even with better embeddings, retrieval quality and relevance remain inconsistent for longitudinal personal data. 2. Reliability and hallucination: Even when grounded in the user’s actual data, the agent still hallucinates or overgeneralizes, especially when synthesizing information across multiple sources or time periods. I’ve tried different grounding techniques and structured outputs, but getting consistent reliability on messy, incomplete, or subjective personal data is still difficult. Evaluation is also tricky since there’s often no clear ground truth. Curious how others building personal or long-running agents are handling memory architectures and reducing hallucination with noisy real-world data.
Original Article

Similar Articles

Synthesis and Evaluation of Long-term History-aware Medical Dialogue

arXiv cs.CL

This paper introduces a framework for synthesizing long-term medical dialogue datasets using LLMs, and creates MediLongChat with three benchmark tasks to evaluate healthcare agents' memory and reasoning capabilities. Experiments show that even state-of-the-art LLMs struggle with these tasks.