autoregressive-llm

#autoregressive-llm

BayLing-Duplex: Native Full-Duplex Speech Dialogue with a Single Autoregressive LLM

arXiv cs.CL ↗ · 2d ago Cached

BayLing-Duplex is a native full-duplex speech language model that enables a single autoregressive LLM to manage turn-taking and interruptions without external VAD modules, achieving high success rates and improved response quality over prior models.

0 favorites 0 likes

#autoregressive-llm

Orthrus: Memory-Efficient Parallel Token Generation via Dual-View Diffusion

Hugging Face Daily Papers ↗ · 2026-05-12 Cached

Orthrus is a dual-architecture framework that combines autoregressive LLMs with diffusion models for fast parallel token generation while maintaining exact inference fidelity via shared KV caches and consensus mechanisms, achieving up to 7.8x speedup.

0 favorites 0 likes

#autoregressive-llm

Liberating LLM Capabilities in Full-Duplex Speech Models

Hugging Face Daily Papers ↗ · 2026-05-04 Cached

Proposes Listen-Write-Speak (LWS), a text-first tri-channel paradigm that allows a single autoregressive LLM to continuously listen, write visible text, and speak in real-time, enabling full-duplex speech interaction without architectural modifications.

0 favorites 0 likes

autoregressive-llm

BayLing-Duplex: Native Full-Duplex Speech Dialogue with a Single Autoregressive LLM

Orthrus: Memory-Efficient Parallel Token Generation via Dual-View Diffusion

Liberating LLM Capabilities in Full-Duplex Speech Models

Submit Feedback