Tag
BayLing-Duplex is a native full-duplex speech language model that enables a single autoregressive LLM to manage turn-taking and interruptions without external VAD modules, achieving high success rates and improved response quality over prior models.
Orthrus is a dual-architecture framework that combines autoregressive LLMs with diffusion models for fast parallel token generation while maintaining exact inference fidelity via shared KV caches and consensus mechanisms, achieving up to 7.8x speedup.
Proposes Listen-Write-Speak (LWS), a text-first tri-channel paradigm that allows a single autoregressive LLM to continuously listen, write visible text, and speak in real-time, enabling full-duplex speech interaction without architectural modifications.