self-supervised-learning

#self-supervised-learning

3D Masked Autoencoders are Robust Learners of Volumetric and Multimodal Cellular Representations for Microscopy

arXiv cs.LG ↗ · yesterday Cached

This paper presents 3D masked autoencoders for volumetric microscopy data, demonstrating that 3D modeling outperforms 2D max-projection and slice-based variants on downstream single-cell tasks, with cross-modal alignment to a protein language model further improving performance.

0 favorites 0 likes

#self-supervised-learning

Layer-wise Probing of wav2vec 2.0 and Whisper for Consonant Cluster Reduction in African American English

arXiv cs.CL ↗ · yesterday Cached

This paper uses layer-wise probing to investigate how wav2vec 2.0 and Whisper encode consonant cluster reduction in African American English, finding that both models distinguish reduced and canonical forms and preserve cues to underlying stops.

0 favorites 0 likes

#self-supervised-learning

@rohanpaul_ai: New Microsoft paper argues that transformers generalize better when they learn compact internal states, not just next t…

X AI KOLs Timeline ↗ · yesterday Cached

Microsoft's NextLat paper proposes a self-supervised training method where transformers predict their next hidden state instead of just the next token, leading to more compact world models, better planning and reasoning, and up to 3.3x faster generation.

0 favorites 0 likes

#self-supervised-learning

@AlphaSignalAI: https://x.com/AlphaSignalAI/status/2069424192274252094

X AI KOLs Timeline ↗ · yesterday Cached

Microsoft's NextLat introduces a training objective that rewards belief-state representations instead of relying solely on next-token prediction, pushing models toward compact world models for better generalization.

0 favorites 0 likes

#self-supervised-learning

UniverSat: Resolution- and Modality-Agnostic Transformers for Earth Observation

Hugging Face Daily Papers ↗ · 3d ago Cached

UniverSat introduces a Universal Patch Encoder for Vision Transformers that enables robust, sensor-agnostic spatial feature extraction across diverse Earth Observation data types, achieving strong results on classification and segmentation benchmarks.

0 favorites 0 likes

#self-supervised-learning

PragReST: Self-Reinforcing Counterfactual Reasoning for Pragmatic Language Understanding

arXiv cs.CL ↗ · 2026-06-18 Cached

PragReST is a self-supervised framework that improves LLM pragmatic reasoning by generating counterfactual reasoning traces and training models via supervised fine-tuning and reinforcement learning, achieving significant gains on pragmatic benchmarks without human-labeled data.

0 favorites 0 likes

#self-supervised-learning

Toward Parking Spot Occupancy Recognition: A Self-Supervised Approach

Hugging Face Daily Papers ↗ · 2026-06-18 Cached

This paper presents a self-supervised transfer learning approach for parking spot occupancy recognition that achieves high accuracy (up to 97.8%) with minimal labeled data using a two-stage training strategy with SimCLR and ResNet-50.

0 favorites 0 likes

#self-supervised-learning

When, Where, and How: Adaptive Binning for Tabular Self-Supervised Learning

Hugging Face Daily Papers ↗ · 2026-06-18 Cached

This paper proposes Adaptive Binning, a learning-coupled feature-wise coarse-to-fine curriculum for tabular self-supervised learning that adaptively discretizes features, improving representations on medical datasets and establishing a unified benchmark.

0 favorites 0 likes

#self-supervised-learning

@ethanmclark1: Working in robotics right now is what I imagine working with language models felt like in 2023. Everyone throwing thing…

X AI KOLs Following ↗ · 2026-06-17 Cached

A robotics researcher compares current robotics approaches to the language model landscape of 2023, arguing that representation prediction (JEPA) is the most scalable method as it can leverage action-free video data like YouTube, unlike other methods that require action-labeled data.

0 favorites 0 likes

#self-supervised-learning

Next-Latent Prediction Transformers [R]

Reddit r/MachineLearning ↗ · 2026-06-17

Microsoft Research introduces Next-Latent Prediction (NextLat), a self-supervised method that trains transformers to predict their own next latent state, enabling compact world models for reasoning and planning and achieving up to 3.3x faster inference via self-speculative decoding.

0 favorites 0 likes

#self-supervised-learning

Perceptual compensation for tonal context in self-supervised speech models

arXiv cs.CL ↗ · 2026-06-17 Cached

This paper investigates whether the wav2vec2.0 architecture exhibits perceptual compensation for tonal context in Mandarin Chinese, finding limited evidence in the self-supervised model compared to human listeners and suggesting that supervised fine-tuning may be necessary for such phonological abstraction.

0 favorites 0 likes

#self-supervised-learning

@AlexiGlad: Progress in AI is driven by approaches that make weaker assumptions, which allows for better scaling But representation…

X AI KOLs Following ↗ · 2026-06-16 Cached

Introduces Temporal Difference in Vision (TDV), a new paradigm for representation learning that relies solely on causality, eliminating the need for augmentations, masking, or cropping, and matches state-of-the-art methods like DINO and iBOT on dense spatial tasks.

0 favorites 0 likes

#self-supervised-learning

@ninaddaithankar: Can a vision model learn to see with no augmentations, no masking, no cropping, no reconstruction? It can! Introducing …

X AI KOLs Timeline ↗ · 2026-06-16 Cached

Introduces Temporal Difference in Vision (TDV), a novel visual representation learning paradigm that learns useful representations without augmentations, masking, cropping, or reconstruction, and matches state-of-the-art methods on dense spatial tasks.

0 favorites 0 likes

#self-supervised-learning

RECTOR: Masked Region-Channel-Temporal Modeling for Affective and Cognitive Representation Learning

arXiv cs.LG ↗ · 2026-06-16 Cached

RECTOR is a self-supervised framework that learns joint region-channel-temporal representations from EEG/sEEG signals for affective and cognitive state classification, achieving state-of-the-art results on emotion recognition and task-engagement benchmarks.

0 favorites 0 likes

#self-supervised-learning

@vintcessun: Time series anomaly detection has always had an annoying gap: algorithms throw a score at you but never tell you "why is it anomalous here?" Without explanation, users are left staring blankly, with no basis for trust or diagnosis. ProtoX-AD finally breaks through this barrier—it embeds an interpretable prototype vector layer within a self-supervised classification framework, where each prototype corresponds to a transformation pattern or anomaly characteristic. During training, normal samples cluster near their corresponding prototypes while anomalous samples stay far from all prototypes; during inference, classification error is used for detection, and prototype similarity directly tells you whether the anomaly is a "local mutation" or a "trend shift." Detection accuracy is not sacrificed, and explanations come with semantics. The limitation is that prototypes need to be predefined, but it finally fills the critical gap of interpretability.

X AI KOLs Timeline ↗ · 2026-06-15 Cached

ProtoX-AD is a prototype-based self-explainable framework for self-supervised time series anomaly detection that provides interpretable explanations for detected anomalies by learning transformation-aware prototypes, achieving performance comparable to black-box methods while offering semantic anomaly characterization.

0 favorites 0 likes

#self-supervised-learning

You Don't Need Strong Assumptions: Visual Representation Learning via Temporal Differences

Hugging Face Daily Papers ↗ · 2026-06-14 Cached

The paper introduces Temporal Difference in Vision (TDV), a self-supervised learning method for video that relies only on a causal assumption that past causes future, avoiding strong inductive biases while matching state-of-the-art on dense spatial tasks.

0 favorites 0 likes

#self-supervised-learning

ViT-Up: Faithful Feature Upsampling for Vision Transformers

Hugging Face Daily Papers ↗ · 2026-06-12 Cached

ViT-Up introduces a task-agnostic feature upsampler for Vision Transformers that predicts features at arbitrary continuous image coordinates, enabling dense feature maps at any resolution and improving dense prediction and semantic correspondence benchmarks. It outperforms prior state-of-the-art upsamplers, with gains of up to +2.07 mIoU on Cityscapes and +4.17 [email protected] on SPair-71k.

0 favorites 0 likes

#self-supervised-learning

UR-BERT: Scaling Text Encoders for Massively Multilingual TTS Through Universal Romanization and Speech Token Prediction

arXiv cs.CL ↗ · 2026-06-11 Cached

UR-BERT proposes a Romanized transcription-based text encoder for massively multilingual TTS, scaling to 495 languages by using universal Romanization and a speech token prediction objective to enhance phonetic alignment and generalization to unseen languages.

0 favorites 0 likes

#self-supervised-learning

Speaker Group Encoding in Self-supervised Speech Recognition Models

arXiv cs.CL ↗ · 2026-06-10 Cached

Investigates how self-supervised speech recognition models encode speaker group information (gender, age, dialect, ethnicity, native speaker status) across layers, and how finetuning for tasks like ASR or speaker identification affects this encoding.

0 favorites 0 likes

#self-supervised-learning

@AbdelStark: It’s time to JEPA pill the world! awesome-jepa: A curated list of papers, models, code, datasets, and learning resource…

X AI KOLs Timeline ↗ · 2026-06-09 Cached

A curated list of papers, models, code, datasets, and learning resources for Joint Embedding Predictive Architectures (JEPA), the self-supervised approach to world models proposed by Yann LeCun.

0 favorites 0 likes

self-supervised-learning

Submit Feedback