Tag
This paper empirically measures the symmetry–data exchange rate predicted by equivariance theory, finding that wrong-group symmetry constraints are actively harmful, augmentation with test-time orbit averaging matches equivariant architectures, and the theoretical |G|-fold sample complexity reduction is only weakly confirmed with wide confidence intervals. The study is explicitly exploratory and not pre-registered.
This paper develops a measure-theoretic framework analyzing when contrastive learning recovers meaningful latent geometry, introducing a 'diversity condition' on positive-pair sampling and a support-corrected InfoNCE variant, with experiments validating that sampling diversity and architectural inductive bias interact critically in contrastive representation learning.
This exploratory study empirically measures the symmetry–data exchange rate predicted by equivariance theory on controlled C_n-symmetric tasks, finding that wrong-group constraints are actively harmful, augmentation with test-time orbit averaging matches equivariant models exactly, and the empirical exchange rate is broadly consistent with theory but statistically inconclusive. The authors emphasize the study's exploratory nature and call for registered replications.
This paper introduces the concept of 'initialization memory' to study how much of the random initialization bias survives training in deep networks, showing that low-learning-rate SGD preserves initialization while Adam-family optimizers erase it, and linking this to forgetting dynamics.
This paper proposes Energy-Gated Attention (EGA) and Morlet Positional Encoding (MoPE) to address missing inductive biases in transformer attention: token salience and scale-adaptive locality. Experiments on TinyShakespeare show superadditive gains when combined, highlighting complementarity.
This paper investigates the role of inductive bias in time-series pretraining for clinical data, proposing PathoFM, an encoder-centric transformer pretrained on multivariate gait windows. The study compares different pretraining objectives and finds that dynamics-centric mixtures yield the most balanced transfer across classification and regression tasks.
This paper introduces Graph Alignment Topology as an inductive bias for grounding detection, using a graph neural network to model alignment structure between reference information and LLM outputs. The method achieves state-of-the-art results on multiple hallucination and question-answering datasets, outperforming GPT-4o.
This paper investigates how character-level transformer models generalize to irregular verb subtypes in Japanese past-tense inflection. Controlled experiments show that including irregular examples can improve generalization, challenging the assumption that regularity simplifies learning.