Tag
Developed a custom C++ inference engine for MiniCPM-V 4.6 on Orange Pi AIPro (Ascend 310B NPU), achieving 2x speedup over stock framework by writing optimized AscendC kernels for matmul and causal-conv1d, reaching 5.90 tokens/s.
OpenBMB thanks @_akhaliq for contributing a Hugging Face demo for MiniCPM-V 4.6, using Gradio server for flexible frontend customization.