GroupAffect-4: A Multimodal Dataset of Four-Person Collaborative Interaction
Summary
This paper introduces GroupAffect-4, a multimodal dataset of 40 participants in 10 four-person groups performing collaborative tasks. It includes aligned physiology, eye-tracking, audio, self-report, and personality data, along with benchmark targets for within-person, between-person, and group-level analysis.
Similar Articles
Early methods for studying affective use and emotional well-being on ChatGPT
OpenAI and MIT Media Lab researchers conducted two parallel studies analyzing how emotional engagement with ChatGPT affects user well-being, combining analysis of 40 million conversations with a randomized controlled trial of 1,000 participants to understand impacts on loneliness, social interaction, and problematic AI use.
BEHAVE: A Hybrid AI Framework for Real-Time Modeling of Collective Human Dynamics
BEHAVE is a hybrid AI framework for real-time modeling of collective human dynamics, as presented in a preprint on arXiv.
Omni-Persona: Systematic Benchmarking and Improving Omnimodal Personalization
This paper introduces Omni-Persona, the first comprehensive benchmark for omnimodal personalization across text, image, and audio, featuring a Persona Modality Graph and a new Calibrated Accuracy metric to evaluate grounding behaviors.
Evaluating multimodal emotion recognition in proactive conversational agents: A user study
This paper presents a multimodal emotion recognition module for proactive conversational agents, using facial recognition and linguistic analysis. A user study with 20 participants reveals a 'poker face' effect where visual cues are unreliable, while linguistic analysis proves more accurate; the study also shows agents can elicit emotions through conversational adaptation.
DraDDP: A Multimodal Multi-Party Dialogue Discourse Parsing Dataset
This paper introduces DraDDP, the first publicly available English multimodal dataset for multi-party dialogue discourse parsing, built from American TV dramas with 495 segments, 6,374 utterances, and 9.1 hours of video. Benchmarks show multimodal information improves parsing of dialogue structures and relation types.