MiniCPM4: Ultra-Efficient LLMs on End Devices
Summary
MiniCPM4 is a highly efficient large language model designed for end devices, achieving strong performance with 0.5B and 8B parameter versions through innovations in sparse attention, data filtering, training algorithms, and inference systems.
View Cached Full Text
Cached at: 05/26/26, 06:37 PM
Paper page - MiniCPM4: Ultra-Efficient LLMs on End Devices
Source: https://huggingface.co/papers/2506.07900 Published on Jun 9, 2025
#3 Paper of the day Authors:
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Abstract
MiniCPM4, a highly efficient large language model for end-side devices, achieves superior performance using innovations in sparse attention, pre-training datasets, training algorithms, and inference systems.
This paper introduces MiniCPM4, a highly efficient large language model (LLM) designed explicitly for end-side devices. We achieve this efficiency through systematic innovation in four key dimensions: model architecture, training data, training algorithms, and inference systems. Specifically, in terms of model architecture, we proposeInfLLM v2, a trainable sparse attention mechanism that accelerates bothprefillinganddecodingphases for long-context processing. Regarding training data, we proposeUltraClean, an efficient and accurate pre-training data filtering and generation strategy, andUltraChat v2, a comprehensive supervised fine-tuning dataset. These datasets enable satisfactory model performance to be achieved using just 8 trillion training tokens. Regarding training algorithms, we proposeModelTunnel v2for efficient pre-training strategy search, and improve existing post-training methods by introducingchunk-wise rolloutfor load-balanced reinforcement learning anddata-efficient tenary LLM,BitCPM. Regarding inference systems, we proposeCPM.cuthat integrates sparse attention,model quantization, and speculative sampling to achieve efficientprefillinganddecoding. To meet diverse on-device requirements, MiniCPM4 is available in two versions, with 0.5B and 8B parameters, respectively. Sufficient evaluation results show that MiniCPM4 outperforms open-source models of similar size across multiple benchmarks, highlighting both its efficiency and effectiveness. Notably, MiniCPM4-8B demonstrates significant speed improvements over Qwen3-8B when processing long sequences. Through further adaptation, MiniCPM4 successfully powers diverse applications, including trustworthy survey generation and tool use with model context protocol, clearly showcasing its broad usability.
View arXiv pageView PDFProject pageGitHub9.07kAdd to collection
Get this paper in your agent:
hf papers read 2506\.07900
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper20
#### openbmb/MiniCPM4.1-8B Text Generation• 8B• UpdatedOct 24, 2025 • 79.5k • 389
#### openbmb/MiniCPM5-1B Text Generation• 1B• Updatedabout 14 hours ago • 2.41k • 294
#### openbmb/MiniCPM4-8B Text Generation• 8B• UpdatedOct 24, 2025 • 25.6k • 284
#### openbmb/MiniCPM5-1B-GGUF Text Generation• 1B• Updated1 day ago • 1.66k • 81
Browse 20 models citing this paper## Datasets citing this paper1
#### openbmb/Ultra-FineWeb Viewer• UpdatedDec 10, 2025 • 1.29B • 52.2k • 343
Spaces citing this paper12
Browse 12 spaces citing this paper## Collections including this paper14
Similar Articles
MiniCPM-V 4.5: Cooking Efficient MLLMs via Architecture, Data, and Training Recipe
MiniCPM-V 4.5 is an 8B multimodal large language model that achieves high efficiency and strong performance through a unified 3D-Resampler architecture, a novel data strategy, and a hybrid reinforcement learning approach. The model reportedly surpasses larger proprietary and open-source benchmarks while significantly reducing GPU memory usage and inference time.
MiniCPM-V 4.6
MiniCPM-V 4.6 is an ultra-efficient 1.3B vision-language model optimized for mobile devices.
@AdinaYakup: MiniCPM V4.6 a 1B MLLM that actually runs on your phone, just released by @OpenBMB 1B - Apache2.0 Runs on iOS, Android,…
OpenBMB has released MiniCPM V4.6, a 1B-parameter multimodal large language model optimized for mobile devices under the Apache 2.0 license. It features mixed visual token compression and claims approximately 1.5x faster throughput than Qwen3.5 0.8B while running natively on iOS, Android, and HarmonyOS.
OpenBMB releases MiniCPM5-1B LLM. Currently one of the most powerful LLMs for its size. ( 17.9 on the Artificial Analysis Intelligence Index)
OpenBMB releases MiniCPM5-1B, a leading 1B open weights LLM that achieves the highest Artificial Analysis Intelligence Index score (17.9) in its size class, surpassing larger models like Qwen3.5 2B while using fewer parameters.
@FeitengLi: OpenBMB open-sources MiniCPM-V 4.6, 1.3B parameters (SigLIP2-400M + Qwen3.5-0.8B), 262k context, visual encoding FLOPs 50%+ less than previous generation. Token cost for the same task is lower than Qwen3.5-0…
OpenBMB releases MiniCPM-V 4.6, a 1.3B-parameter multimodal LLM with 262k context and significantly reduced visual encoding FLOPs, achieving strong benchmark performance and broad inference framework support.