masked-modeling

#masked-modeling

AnyMo: Scaling Any-Modality Conditional Motion Generation with Masked Modeling

Hugging Face Daily Papers ↗ · 2026-05-28 Cached

This paper introduces AnyMo, a unified multimodal framework for human motion generation that combines a Residual FSQ-based motion tokenizer with a scalable masked modeling transformer, along with the OmniHuMo dataset of over 5,000 hours of motion data to enable high-quality synthesis under arbitrary modality combinations.

0 favorites 0 likes

#masked-modeling

AudioMosaic: Contrastive Masked Audio Representation Learning

arXiv cs.LG ↗ · 2026-05-15 Cached

AudioMosaic introduces a contrastive learning-based audio encoder that uses structured time-frequency masking on spectrogram patches for efficient large-batch training, achieving state-of-the-art performance on audio benchmarks and improving audio-language models.

0 favorites 0 likes

#masked-modeling

CSI-JEPA: Towards Foundation Representations for Ubiquitous Sensing with Minimal Supervision

arXiv cs.LG ↗ · 2026-05-15 Cached

CSI-JEPA is a self-supervised framework for learning reusable representations from unlabeled Wi-Fi channel state information, enabling label-efficient multi-task sensing. It achieves up to 98% label savings and outperforms supervised models.

0 favorites 0 likes

masked-modeling

AnyMo: Scaling Any-Modality Conditional Motion Generation with Masked Modeling

AudioMosaic: Contrastive Masked Audio Representation Learning

CSI-JEPA: Towards Foundation Representations for Ubiquitous Sensing with Minimal Supervision

Submit Feedback