Sentiment Analysis of German Sign Language Fairy Tales

arXiv cs.CL Papers

Summary

A research paper presenting a dataset and XGBoost-based model for sentiment analysis of German Sign Language (DGS) fairy tales using facial and body motion features extracted via MediaPipe, achieving 63.1% balanced accuracy and demonstrating the importance of both facial and body movements for sentiment communication in sign language.

arXiv:2604.16138v1 Announce Type: new Abstract: We present a dataset and a model for sentiment analysis of German sign language (DGS) fairy tales. First, we perform sentiment analysis for three levels of valence (negative, neutral, positive) on German fairy tales text segments using four large language models (LLMs) and majority voting, reaching an inter-annotator agreement of 0.781 Krippendorff's alpha. Second, we extract face and body motion features from each corresponding DGS video segment using MediaPipe. Finally, we train an explainable model (based on XGBoost) to predict negative, neutral or positive sentiment from video features. Results show an average balanced accuracy of 0.631. A thorough analysis of the most important features reveal that, in addition to eyebrows and mouth motion on the face, also the motion of hips, elbows, and shoulders considerably contribute in the discrimination of the conveyed sentiment, indicating an equal importance of face and body for sentiment communication in sign language.
Original Article
View Cached Full Text

Cached at: 04/20/26, 08:30 AM

# Sentiment Analysis of German Sign Language Fairy Tales
Source: https://arxiv.org/abs/2604.16138
View PDF (https://arxiv.org/pdf/2604.16138)

> Abstract: We present a dataset and a model for sentiment analysis of German sign language (DGS) fairy tales. First, we perform sentiment analysis for three levels of valence (negative, neutral, positive) on German fairy tales text segments using four large language models (LLMs) and majority voting, reaching an inter-annotator agreement of 0.781 Krippendorff's alpha. Second, we extract face and body motion features from each corresponding DGS video segment using MediaPipe. Finally, we train an explainable model (based on XGBoost) to predict negative, neutral or positive sentiment from video features. Results show an average balanced accuracy of 0.631. A thorough analysis of the most important features reveal that, in addition to eyebrows and mouth motion on the face, also the motion of hips, elbows, and shoulders considerably contribute in the discrimination of the conveyed sentiment, indicating an equal importance of face and body for sentiment communication in sign language.

## Submission history

From: Fabrizio Nunnari [view email (https://arxiv.org/show-email/b2fab70e/2604.16138)] **[v1]** Fri, 17 Apr 2026 15:10:59 UTC (1,161 KB)

Similar Articles

Emotion Recognition in Sign Language Conversation

arXiv cs.CL

This paper introduces the eJSL Dialog dataset for emotion recognition in sign language conversations, addressing the lack of conversational context in existing datasets. Benchmarking shows a domain gap when applying generic multimodal models, highlighting the need for context-aware visual extractors for sign language.

Direct Translation between Sign Languages

arXiv cs.CL

This paper introduces a direct sign-to-sign translation model that bypasses intermediate text by using back-translation to create synthetic parallel sign language data, achieving significant improvements in speed and accuracy over cascade methods for ASL, CSL, and DGS.