Tag
Proposes Listen-Write-Speak (LWS), a text-first tri-channel paradigm that allows a single autoregressive LLM to continuously listen, write visible text, and speak in real-time, enabling full-duplex speech interaction without architectural modifications.
OmniFlatten is a novel GPT-based model that enables real-time, full-duplex spoken dialogue through a multi-stage post-training technique that integrates speech and text without altering the original architecture.