Tag
MemTrain proposes a self-supervised training framework that uses masked reconstruction and intermediate memory recall proxy tasks on Wikipedia corpora to enhance LLM agents' context memory, achieving up to 17.67 point gains on downstream memory-intensive QA benchmarks.
MindZero introduces a self-supervised reinforcement learning framework that trains multimodal large language models for efficient and robust online mental reasoning without requiring mental state annotations, outperforming model-based methods in accuracy and efficiency.
RayDer is a unified feed-forward transformer that consolidates camera estimation, scene reconstruction, and rendering for self-supervised novel view synthesis from real-world video, achieving clean power-law scaling and strong zero-shot performance.
The SAVE framework improves reward model training by using value functions to grade on-policy responses and update models through contrastive objectives, achieving outperforming results across six benchmarks.
ChildVox presents a comprehensive benchmark for analyzing children's acoustic communication across developmental stages, integrating over 20 sub-tasks from 17 child-centered audio and speech datasets.
PilotWiMAE introduces a self-supervised framework that directly ingests noisy pilot observations for wireless channel representation learning, removing the unrealistic full-CSI assumption and enabling robust cross-frequency beam selection and channel estimation that beats supervised baselines.
This paper proposes a method to improve in-context learning by optimizing the continuous embeddings of a fixed few-shot prompt at test time, using a self-supervised confidence proxy derived from the model's log-probabilities without requiring fine-tuning or token generation.
Next Implicit Token Prediction (NITP) enhances language model pre-training by adding dense continuous supervision in representation space, improving generalization and performance across model sizes with minimal computational overhead.
Introduces the Temporal Contrastive Transformer (TCT), a self-supervised framework for learning temporal embeddings from financial transactions for fraud detection. Achieves AUC 0.8644 with embeddings alone but does not improve over strong engineered features (AUC 0.9205 vs 0.9245), indicating learned representations overlap with existing features.
Black Forest Labs shared the evolution of the Flux series models at the AI Engineer Conference and released the SelfFlow research paper, proposing a self-supervised multimodal training method that does not require external encoders.