Tag
ElevenLabs introduces the ability to call your Hermes Agent, enabling voice-based interaction with AI agents through their platform.
SaliMory is a framework that trains a single language model to manage cognitively-structured memory (user facts, preferences, and working memory) for conversational agents, using hierarchical stage-wise process rewards and reward-decomposed contrastive refinement. It reduces memory-attributed failures by one-third, outperforms state-of-the-art by over 10% in end-to-end accuracy, and more than doubles the Good Personalization rate.
Proposes Structure-Aware RAG (SA-RAG), which uses tables as an intermediate structured representation to reduce noise in retrieval-augmented generation for conversational agents, with quality-aware metadata generation and two table generation methods, outperforming existing baselines on noisy real-world datasets.
This paper presents a multimodal emotion recognition module for proactive conversational agents, using facial recognition and linguistic analysis. A user study with 20 participants reveals a 'poker face' effect where visual cues are unreliable, while linguistic analysis proves more accurate; the study also shows agents can elicit emotions through conversational adaptation.
When2Speak is a synthetic dataset and pipeline for training LLMs to decide when to speak in multi-party conversations. Fine-tuning on this dataset significantly improves turn-taking, with reinforcement learning reducing missed interventions from 50% to ~20%.
Huggingface introduces EcomRLVE-GYM, a framework providing eight verifiable environments for training reinforcement learning agents on complex e-commerce tasks. The tool features adaptive difficulty curricula and algorithmic rewards to improve task completion in shopping assistants, demonstrated by training a Qwen 3 8B model.