acoustic-embeddings

#acoustic-embeddings

Multimodal Speaker Identification in Classroom Environments

arXiv cs.CL ↗ · 2026-06-15 Cached

This paper evaluates a multimodal framework for speaker identification in K-12 classrooms by combining acoustic embeddings (ECAPA-TDNN) with LLM-derived semantic context from transcripts, improving accuracy from 39% to 50.3% overall and from 64.9% to 76.9% for longer utterances.

0 favorites 0 likes

acoustic-embeddings

Multimodal Speaker Identification in Classroom Environments

Submit Feedback