Tag
MuSViT is the first foundation vision model for sheet music, pre-trained on millions of pages via Masked Autoencoders, achieving superior performance in score recognition and symbol detection tasks.
NERVE proposes a network-aware bilinear tokenization method for self-supervised learning on brain functional connectivity matrices using masked autoencoders, improving representation learning across developmental cohorts.