Tag
Proposes CF-JEPA, a mask-free self-supervised learning framework for time-series that uses multi-horizon forward prediction from random crops and exploits asymmetry between online and target encoders for improved performance on classification, forecasting, and anomaly detection.
The paper proposes a hybrid pre-training objective combining JEPA latent-space prediction with MLM reconstruction for language models, showing improved embedding uniformity and semantic-lexical balance.
This paper develops a measure-theoretic framework analyzing when contrastive learning recovers meaningful latent geometry, introducing a 'diversity condition' on positive-pair sampling and a support-corrected InfoNCE variant, with experiments validating that sampling diversity and architectural inductive bias interact critically in contrastive representation learning.
Researchers at MIT Lincoln Laboratory propose 'principle-driven foundation models' that encode signal-theoretic physical principles (Fourier decomposition, energy conservation, symmetry) instead of learning statistical correlations from large paired datasets. Trained exclusively on RF data, their 1.99M parameter frozen encoder achieves 77.7% average accuracy across 15 diverse tasks spanning audio, images, text, and video without any fine-tuning on target domains.
This paper introduces Regret Pre-training, a self-supervised framework that uses a dual-view architecture to incorporate future context into causal language model training, improving performance on downstream tasks by up to 18 percentage points without adding parameters.
NEPA is a new method for visual self-supervised learning and generative pretraining that predicts the next embedding autoregressively, and has been added to a benchmark for evaluation.
The paper identifies a misalignment between the softmax-based InfoNCE loss and the normalized embedding setting in modern contrastive learning. It proposes WEINCE, a simple modification that blends softmax logits with an endpoint shortfall correction using extreme value theory, yielding consistent improvements across vision benchmarks.
This thread presents a theoretical result showing that predicting abstract latent representations (as in JEPA and data2vec) instead of raw tokens can exponentially reduce the data gap between AI and human learning.
This paper introduces BrainSimSiam, a lightweight self-supervised framework using siamese networks to learn robust fMRI representations from positive-only pairs, achieving strong performance on downstream tasks even with limited data.
Proposes DIVE, a compression adapter for embedding dimensionality reduction that uses self-limiting gradient updates and head-wise NT-Xent contrastive loss to prevent overfitting on small datasets, outperforming existing methods on BEIR benchmarks.
This paper adapts instance discrimination self-supervised learning to link prediction in graphs, proposing new models L-GRACE and L-BGRL that operate on link representations and improve performance especially on unattributed graphs.
VCR is a self-supervised framework that learns robust representations from incomplete wearable signals using orthogonal tokenization and missing-aware mixture-of-experts, improving performance under modality missingness.
Introduces Alice, a closed-loop system that learns executable world models online under prior misalignment by treating failed candidate updates as structural signal, achieving improved performance on a variant of Baba Is You with semantically remapped labels.
Crys-JEPA introduces a joint embedding predictive architecture for crystals that learns an energy-aware latent space, achieving significant improvements in stability and novelty for de novo crystal discovery.
AudioMosaic introduces a contrastive learning-based audio encoder that uses structured time-frequency masking on spectrogram patches for efficient large-batch training, achieving state-of-the-art performance on audio benchmarks and improving audio-language models.
CSI-JEPA is a self-supervised framework for learning reusable representations from unlabeled Wi-Fi channel state information, enabling label-efficient multi-task sensing. It achieves up to 98% label savings and outperforms supervised models.
This paper introduces a unified geometric framework showing that weighted InfoNCE objectives can be interpreted as Distance Geometry Problems, providing exact characterizations of optimal embeddings for supervised and weakly supervised contrastive learning methods and revealing when such embeddings are geometrically realizable, degenerate, or inconsistent.
NERVE proposes a network-aware bilinear tokenization method for self-supervised learning on brain functional connectivity matrices using masked autoencoders, improving representation learning across developmental cohorts.
This paper introduces HEPA, a self-supervised architecture for predicting rare critical events in time series using a Joint-Embedding Predictive Architecture (JEPA) pretraining strategy. It demonstrates superior performance across multiple domains with significantly fewer labeled data and tuned parameters compared to leading models.
A GitHub repository providing minimal, standalone PyTorch reimplementations of JEPA family models (I-JEPA, V-JEPA, V-JEPA 2, C-JEPA) for educational purposes, including tutorials and visualization tools.