Tag
This thread presents a theoretical result showing that predicting abstract latent representations (as in JEPA and data2vec) instead of raw tokens can exponentially reduce the data gap between AI and human learning.
This paper introduces Learned Relay Representations (Relay), a method that allows masked diffusion models to propagate latent information across denoising steps, overcoming the hard reset problem and improving performance-latency trade-offs. The method is shown to outperform standard supervised finetuning on coding tasks while reducing inference latency by up to 32%.
Sub-JEPA improves LeWorldModel by applying Gaussian regularization in frozen random orthogonal subspaces, consistently outperforming the original on benchmarks with up to +10.7 percentage points improvement.