Tag
This paper presents a theoretical framework interpreting Transformer components (attention, residual connections, normalization) as arising from a spherical state estimation problem using Radial-Tangential SDEs.
This paper introduces the Context-Contaminated Restart Model (CCRM) to formally analyze how failed attempts in LLM agent pipelines contaminate context and increase error rates during retries. It provides theoretical proofs and validates the model against SWE-bench data, showing significant discrepancies with standard independent models.
A PyTorch library that compiles neural networks from Turing machine descriptions, enabling exact simulation without training.
This empirical study validates theoretical findings on feature repulsion and spectral lock-in during the grokking phenomenon in two-layer neural networks, demonstrating how activation functions influence the transition from memorization to generalization.
This paper analyzes zero-shot conditional sampling with pretrained diffusion models for linear inverse problems, providing information-theoretic guarantees and proposing a projected-Langevin initialization method.