Tag
This paper introduces Fine-grained Fragment Retrieval (FFR), a new task for locating semantically coherent multi-modal fragments (text and images) within long-form dialogues. The authors propose F2RVLM, a generation-based retrieval model trained with reinforcement learning, and FFRS, a two-stage retrieval system, along with a new dataset MLDR for evaluation.
This paper proposes a unified framework for memory access and selection in long-context dialogue systems, using Bayes factors to quantify the utility of historical turns for modeling changing user preferences. Experiments show it outperforms embedding-based retrieval on preference-intensive tasks.
This paper proposes PUMA, a framework for LLM personalization in multi-turn conversations that models latent user states and uses the Free Energy Principle to select dialogue actions, improving long-horizon outcomes on healthcare counseling benchmarks.
Proposes FF-BPSN, a forward-focused bidirectional pseudo-siamese network using two transformer decoders for dialogue path planning in target-oriented proactive dialogues, achieving state-of-the-art on DuRecDial benchmarks.
Proposes SKG-Eval, a quasi-deterministic evaluation framework for multi-turn dialogue that uses incremental semantic knowledge graphs to detect cross-turn inconsistencies, contradiction, and topic drift, achieving higher correlation with human judgments.
Introduces Inquisitive Conversational Agents (ICAs) for proactive information extraction in legal dialogue, proposing a Dual Hierarchical Reinforcement Learning framework that learns when and how to ask probing questions, evaluated on U.S. Supreme Court oral arguments.
This paper proposes a method to enhance target-guided proactive dialogue systems by jointly modeling user profiles and domain knowledge as conversational scenarios and employing intent-keyword bridging to predict future dialogue turns.
The article highlights a research update describing an interaction model capable of tracking cognitive states like thinking, yielding, and self-correction during storytelling without a built-in dialogue management system.
Researchers from KTH Royal Institute of Technology propose a two-stage framework that fine-tunes LLMs on dialogue transcripts and uses contrastive learning to create joint embeddings for aligning backchannel signals with conversational context, demonstrating improved context-backchannel retrieval compared to previous methods.
STRIDE-ED is a strategy-grounded reasoning framework for empathetic dialogue systems that uses structured multi-stage reasoning combined with a data refinement pipeline and two-stage training (supervised fine-tuning + multi-objective RL) to improve emotional understanding and response generation. The framework demonstrates consistent improvements across open-source LLMs on both automatic metrics and human evaluations.
Context-Agent proposes a novel framework that models multi-turn dialogue history as dynamic tree structures rather than flat sequences, better capturing the hierarchical and branching nature of natural conversation. The paper introduces the NTM benchmark for evaluating non-linear dialogue scenarios and demonstrates improved task completion rates and token efficiency across various LLMs.
CoLabScience introduces a proactive LLM assistant for biomedical research that autonomously intervenes in scientific discussions using PULI (Positive-Unlabeled Learning-to-Intervene), a novel reinforcement learning framework that determines when and how to contribute context-aware insights. The work includes BSDD, a new benchmark dataset of simulated research dialogues with intervention points derived from PubMed articles.
This paper proposes multi-strategy utterance generation methods for Emotional Support Conversations (ESC), where each utterance can contain multiple strategy-response pairs. Two generation approaches (All-in-One and One-by-One) enhanced with cognitive reasoning via reinforcement learning are evaluated on the ESConv dataset, demonstrating improved supportive quality and dialogue success.