Tag
This paper proposes a practical evaluation method for long-form simultaneous speech-to-speech translation that uses ASR, forced alignment, and sentence embedding alignment to compute latency and quality metrics on continuous speech, overcoming limitations of prior approaches.
This paper introduces NaturalFlow, a fluency-aware optimization framework that reduces disruptive pauses in simultaneous speech-to-speech translation by leveraging model-internal signals, achieving a balance between low latency and natural speech flow.