Tag
This paper proposes PUMA, a framework for LLM personalization in multi-turn conversations that models latent user states and uses the Free Energy Principle to select dialogue actions, improving long-horizon outcomes on healthcare counseling benchmarks.
This paper proposes a framework to distinguish between capability elicitation and creation in large language model post-training using a free-energy perspective, arguing that supervised fine-tuning and reinforcement learning often reweight existing behaviors rather than creating new ones.