Tag
This paper explores using a Vision-Language Model (VLM) to detect attention loss in educational videos by combining gaze data with video content, but finds that VLM approaches do not outperform traditional machine learning baselines.