Tag
This paper empirically measures the symmetry–data exchange rate predicted by equivariance theory, finding that wrong-group symmetry constraints are actively harmful, augmentation with test-time orbit averaging matches equivariant architectures, and the theoretical |G|-fold sample complexity reduction is only weakly confirmed with wide confidence intervals. The study is explicitly exploratory and not pre-registered.
This paper proves that learning by predicting latent representations (as in world models like JEPA and data2vec) requires exponentially less data than predicting tokens (as in LLMs) for hierarchical data with hidden structure.
This exploratory study empirically measures the symmetry–data exchange rate predicted by equivariance theory on controlled C_n-symmetric tasks, finding that wrong-group constraints are actively harmful, augmentation with test-time orbit averaging matches equivariant models exactly, and the empirical exchange rate is broadly consistent with theory but statistically inconclusive. The authors emphasize the study's exploratory nature and call for registered replications.
This paper provides a refined theoretical analysis of actor-critic methods with entropy regularization, showing that an exact critic acts as a strong variance reducer and enables sample complexity comparable to deterministic policy gradient, and that with a sufficiently accurate learned critic the benefits are preserved.
This paper introduces Good Policy Identification (GPI) in reinforcement learning, aiming to find a policy meeting a reward threshold rather than the optimal one, and proposes the BEE-GPI algorithm with near-optimal sample complexity guarantees.
This paper studies risk-sensitive reinforcement learning in finite discounted MDPs with a generative model, focusing on the sample complexity of learning optimal value functions and policies under the optimized certainty equivalent (OCE) risk measure. It provides exact conditions for PAC-learnability, analyzes a model-based approach, and establishes tight lower bounds, including an improved dependence on the risk parameter for CVaR.
This paper provides the first non-asymptotic sample complexity bounds for learning exponential families of polynomials with score matching, showing polynomial dependence on model dimension.