MiniCPM-o 4.5: Towards Real-Time Full-Duplex Omni-Modal Interaction

Hugging Face Daily Papers 04/30/26, 12:00 AM Papers

Summary

MiniCPM-o 4.5 is a 9B parameter multimodal model featuring Omni-Flow, a framework enabling real-time full-duplex interaction where the model can simultaneously perceive and respond proactively. It achieves state-of-the-art open-source performance comparable to Gemini 2.5 Flash and runs on edge devices with less than 12GB RAM.

Recent progress in multimodal large language models (MLLMs) has brought AI capabilities from static offline data processing to real-time streaming interaction, yet they still remain far from human-level multimodal interaction. The key bottlenecks are no longer modality coverage or latency alone, but the interaction paradigm itself. First, perception and response are still separated into alternating phases, preventing models from incorporating new inputs for timely adjustment during generation. Second, most current models remain reactive, responding only to explicit user requests instead of acting proactively in the evolving multimodal environment. We present MiniCPM-o 4.5, our latest effort towards human-like multimodal interaction, which mitigates these gaps by real-time full-duplex omni-modal interaction. It can see, listen, and speak simultaneously in real-time, while also exhibiting proactive behaviors such as issuing reminders or comments based on its continuous understanding of the live scene. The key technique behind MiniCPM-o 4.5 is Omni-Flow, a unified streaming framework that aligns omni-modal inputs and outputs along a shared temporal axis. This formulation converts conventional turn-based interaction into a full-duplex, time-aligned process, enabling simultaneous perception and response and allowing proactive behavior to arise within the same framework. With a total of 9B parameters, MiniCPM-o 4.5 approaches Gemini 2.5 Flash in vision-language capabilities, delivering state-of-the-art open-source performance at its scale. It also surpasses Qwen3-Omni-30B-A3B in omni-modal understanding and delivers better speech generation, with significantly higher computation efficiency. Driven by its efficient architecture design and inference optimization, the model can perform real-time full-duplex omni-modal interaction on edge devices with less than 12GB RAM cost.

Original Article

View Cached Full Text

Cached at: 05/08/26, 07:57 AM

Paper page - MiniCPM-o 4.5: Towards Real-Time Full-Duplex Omni-Modal Interaction

Source: https://huggingface.co/papers/2604.27393 Authors:

Abstract

MiniCPM-o 4.5 enables real-time full-duplex multimodal interaction through Omni-Flow, a unified streaming framework that aligns inputs and outputs temporally for simultaneous perception and response.

Recent progress inmultimodal large language models(MLLMs) has brought AI capabilities from static offline data processing toreal-time streaming interaction, yet they still remain far from human-level multimodal interaction. The key bottlenecks are no longer modality coverage or latency alone, but the interaction paradigm itself. First, perception and response are still separated into alternating phases, preventing models from incorporating new inputs for timely adjustment during generation. Second, most current models remain reactive, responding only to explicit user requests instead of acting proactively in the evolving multimodal environment. We present MiniCPM-o 4.5, our latest effort towards human-like multimodal interaction, which mitigates these gaps by real-time full-duplexomni-modal interaction. It can see, listen, and speak simultaneously in real-time, while also exhibiting proactive behaviors such as issuing reminders or comments based on its continuous understanding of the live scene. The key technique behind MiniCPM-o 4.5 isOmni-Flow, a unified streaming framework that aligns omni-modal inputs and outputs along a sharedtemporal axis. This formulation converts conventionalturn-based interactioninto a full-duplex, time-aligned process, enabling simultaneous perception and response and allowing proactive behavior to arise within the same framework. With a total of 9B parameters, MiniCPM-o 4.5 approaches Gemini 2.5 Flash invision-language capabilities, delivering state-of-the-art open-source performance at its scale. It also surpasses Qwen3-Omni-30B-A3B inomni-modal understandingand delivers betterspeech generation, with significantly higher computation efficiency. Driven by its efficient architecture design and inference optimization, the model can perform real-time full-duplexomni-modal interactiononedge deviceswith less than 12GB RAM cost.

View arXiv page View PDF Project page GitHub24.5k Add to collection

Get this paper in your agent:

hf papers read 2604\.27393

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2604.27393 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2604.27393 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2604.27393 in a Space README.md to link it from this page.

MiniCPM-o 4.5: Towards Real-Time Full-Duplex Omni-Modal Interaction

Paper page - MiniCPM-o 4.5: Towards Real-Time Full-Duplex Omni-Modal Interaction

Abstract

Models citing this paper0

Datasets citing this paper0

Spaces citing this paper0

Collections including this paper1

Similar Articles

@rohanpaul_ai: Just a few days back, Thinking Machines Lab (TML), showcased a way of making AI interaction continuous instead of turn-…

@AdinaYakup: MiniCPM V4.6 a 1B MLLM that actually runs on your phone, just released by @OpenBMB 1B - Apache2.0 Runs on iOS, Android,…

MiniCPM-V 4.6

MiniCPM4: Ultra-Efficient LLMs on End Devices

MiniCPM-V 4.5: Cooking Efficient MLLMs via Architecture, Data, and Training Recipe

Submit Feedback

Similar Articles

@rohanpaul_ai: Just a few days back, Thinking Machines Lab (TML), showcased a way of making AI interaction continuous instead of turn-…

@AdinaYakup: MiniCPM V4.6 a 1B MLLM that actually runs on your phone, just released by @OpenBMB 1B - Apache2.0 Runs on iOS, Android,…

MiniCPM4: Ultra-Efficient LLMs on End Devices

MiniCPM-V 4.5: Cooking Efficient MLLMs via Architecture, Data, and Training Recipe