Tag
Swanbench-Speech is a comprehensive benchmark for evaluating long-form speech generation across diverse scenarios, using multi-dimensional metrics covering acoustics, semantics, and expressiveness, revealing limitations of current models.
A robotics engineer from Hugging Face proposes mapping human facial expressions onto non-humanoid robots to enhance expressiveness while avoiding the uncanny valley, with plans to use this data for autonomous body language training.
WavAlign introduces a modality-aware adaptive post-training method that uses constrained preference updates and explicit anchoring to boost both semantic quality and speech expressiveness in end-to-end spoken dialogue models.