Tag
This paper presents JoyAI-VL-Interaction, an open-source 8B-scale vision-language model that operates continuously in real-time, deciding autonomously when to respond or delegate. It includes a complete deployable system and a training recipe, outperforming Doubao and Gemini in human evaluations.