Tag
PRISM is a novel framework for cross-subject EEG emotion recognition that combines prioritized channel importance weighting via a lightweight expert ensemble with semi-supervised domain adaptation using confidence-filtered pseudo-labels, achieving state-of-the-art results on DEAP, DREAMER, and SEED datasets.
This paper introduces a hybrid framework for sentence-level emotion annotation of song lyrics that optimizes human and LLM collaboration by predicting misalignment, addressing subjectivity and scalability challenges in lyric emotion recognition.
This paper evaluates twelve recent text encoders on their ability to encode affective cues from three psychological emotion theories, finding that instruction-aware open-weight encoders match or exceed proprietary ones at word level, while task-tuned embeddings are superior at sentence level.
This paper investigates using a Transformer-based generative model to learn emotional body motions from motion-capture data of Japanese actors, generating motions conditioned on discrete emotion labels. Evaluations show the generated motions improve emotion recognition when used for data augmentation and enable smooth transitions between emotion intensities.
This paper presents NEST-V1, a proof-of-concept multimodal framework for generating emotion-conditioned Nepali Sign Language avatars from spoken input, achieving 81.1% ASR accuracy and 79.21% emotion recognition accuracy on a dataset of 600 audio samples from 50 speakers.
This paper evaluates deep learning models (LSTM, TCN, Transformer) on the WESAD dataset for multimodal emotion recognition from physiological signals, showing that an ensemble achieves 98.91% accuracy.
RECTOR is a self-supervised framework that learns joint region-channel-temporal representations from EEG/sEEG signals for affective and cognitive state classification, achieving state-of-the-art results on emotion recognition and task-engagement benchmarks.
PRISM is a multi-agent framework that decouples speech perception, response generation, and speech synthesis to improve empathetic spoken dialogue by integrating prosodic cues with LLM reasoning and external knowledge tools.
SHALA-LLM is a reinforcement learning framework that enables LLMs to learn directly from annotator distributions and dynamically prioritize highly ambiguous samples during alignment, improving agreement with human label distributions and classification performance.
This paper introduces the eJSL Dialog dataset for emotion recognition in sign language conversations, addressing the lack of conversational context in existing datasets. Benchmarking shows a domain gap when applying generic multimodal models, highlighting the need for context-aware visual extractors for sign language.
This paper presents a multimodal emotion recognition module for proactive conversational agents, using facial recognition and linguistic analysis. A user study with 20 participants reveals a 'poker face' effect where visual cues are unreliable, while linguistic analysis proves more accurate; the study also shows agents can elicit emotions through conversational adaptation.
This paper proposes a plug-and-play module using self-paced curriculum learning to enhance modality balance in multimodal conversational emotion recognition, achieving consistent F1-score improvements on IEMOCAP and MELD datasets.
This paper proposes a lightweight framework using sticky factorial HDP-HMMs to model conversational emotion as latent regimes from multimodal valence-arousal trajectories, aiming for interpretable and computationally efficient emotional state tracking.
This article introduces EmoS, a high-fidelity multimodal benchmark designed for fine-grained streaming emotional understanding, addressing limitations in ecological validity and labeling reliability found in existing datasets.
Research paper examining how large language models express social emotions compared to human cultural norms, finding systematic misalignment where LLMs show inconsistent patterns of engaging vs. disengaging emotion expressivity across cultural personas (European American and Latin American) compared to human responses.