theory

#theory

Consciousness is all you need

Reddit r/ArtificialInteligence ↗ · 10h ago

This paper presents an information-processing theory of consciousness and argues that instantiating conscious subsystems in AI could enable superior adaptation without extensive training, potentially leading to AGI.

0 favorites 0 likes

#theory

High-Probability PL-SGD with Markovian Noise: Optimal Mixing and Tail Dependence

arXiv cs.LG ↗ · 2d ago Cached

This paper provides optimal high-probability bounds for stochastic gradient descent under Markovian noise for PL-smooth objectives, closing gaps between expectation and high-probability guarantees and extending to heavy-tailed settings with matching lower bounds.

0 favorites 0 likes

#theory

@k_solidified_: https://arxiv.org/abs/2106.10165 All of humanity should read this

X AI KOLs Timeline ↗ · 4d ago Cached

This book develops an effective theory for deep neural networks, showing that their predictions are nearly-Gaussian and governed by the depth-to-width ratio, and introduces representation group flow to analyze signal propagation and learning dynamics.

0 favorites 0 likes

#theory

Puzzling Success of Overparameterization: Lottery Tickets or Escape Dimensions?

Hacker News Top ↗ · 4d ago

A paper investigating the reasons behind the success of overparameterization in neural networks, comparing the lottery ticket hypothesis with escape dimensions.

0 favorites 0 likes

#theory

Catastrophic Compositional Generation: Why Vanilla Diffusion Models Fail to Extrapolate

arXiv cs.LG ↗ · 4d ago Cached

This paper argues that vanilla conditional diffusion models fundamentally fail at compositional generation when the target distribution is out-of-distribution, due to score estimation error, and that inference-time corrections cannot fully compensate.

0 favorites 0 likes

#theory

A Knowledge Theory of Capital:The Value of Natural and Artificial Intelligence

arXiv cs.AI ↗ · 2026-06-18 Cached

This paper presents a knowledge-based theory of capital, examining the value of both natural and artificial intelligence from an economic perspective.

0 favorites 0 likes

#theory

What Must Generalist Agents Remember?

arXiv cs.AI ↗ · 2026-06-18 Cached

This paper develops a formal account of what generalist agents must store in memory to act near-optimally across multiple environments and goals, presenting a separation theorem that memory is necessary for domain disambiguation and transition-model reconstruction.

0 favorites 0 likes

#theory

@docmilanfar: I really enjoyed the explainer for our recent paper on "Geometry of Noise" arXiv:2602.18428

X AI KOLs Timeline ↗ · 2026-06-17 Cached

This paper provides a theoretical explanation for why diffusion models can generate clean samples without explicit noise-level conditioning, attributing it to high-dimensional geometry and analyzing why some model parameterizations succeed while others collapse.

0 favorites 0 likes

#theory

@machinestein: ICML 2026: Latent Reasoning in TRMs is Secretly a Policy Improvement Operator Why does recursive reasoning, especially …

X AI KOLs Timeline ↗ · 2026-06-16 Cached

The paper reveals that latent reasoning in transformer-based reasoning models (TRMs) functions as a policy improvement operator, and proposes an algorithm that enhances learning and inference efficiency by up to 18x.

0 favorites 0 likes

#theory

When to use what Schatten-$p$ norm in deep learning?

arXiv cs.LG ↗ · 2026-06-16 Cached

This paper provides guidance on the appropriate use of different Schatten-p norms in deep learning, analyzing their theoretical properties and practical implications for model regularization and optimization.

0 favorites 0 likes

#theory

Uncertainty Estimation and Generalization Bounds for Modern Deep Learning

arXiv cs.LG ↗ · 2026-06-15 Cached

This paper presents theoretical bounds for uncertainty estimation and generalization in modern deep learning models.

0 favorites 0 likes

#theory

WorldKernel: A World Model is the Coupling Kernel of Admissible Possible Worlds

arXiv cs.AI ↗ · 2026-06-10 Cached

The paper identifies a failure mode where predictors collapse to a point on unidentified counterfactual couplings and proposes a framework using a positive semidefinite coupling kernel to bound counterfactuals, showing that prediction cannot represent uncertainty over cross-world couplings and that enforcing kernel constraints yields tractable bounds.

0 favorites 0 likes

#theory

Transformers Are Inherently Succinct

Hacker News Top ↗ · 2026-06-05 Cached

This paper argues that transformer architectures are inherently succinct, meaning they can represent certain functions more efficiently than other models. It presents theoretical analysis and proofs.

0 favorites 0 likes

#theory

The Evaluation Blind Spot: A Stereological Theory of Benchmark Coverage for Large Language Models

arXiv cs.LG ↗ · 2026-06-05 Cached

This paper applies stereological theory to LLM benchmarks, revealing that current leaderboards measure only 3–5 independent dimensions, creating geometric blind spots that dominate statistical noise. It provides theoretical bounds on benchmark coverage and a submodular algorithm for efficient benchmark selection.

0 favorites 0 likes

#theory

@snowboat84: This is the second part of the "When Physics Meets AI" series. The role of physics in AI can be divided into four layers: (1) The first layer is the bottommost, providing the computational skeleton—energy, entropy, and free energy are embedded into AI's training objectives. (2) The second layer is the middle layer, where physics shapes the network architecture—Hopfield's Ising energy function, CNN's translational symmetry, and renormalization group correspond to the hierarchical structure of deep networks.

X AI KOLs Timeline ↗ · 2026-06-05 Cached

This article explores the four layers of physics' role in AI, from the bottom computational skeleton to the methodological layer, arguing that physics' methodology is migrating from the natural world to the AI domain.

0 favorites 0 likes

#theory

Spectral Asymptotics of Neural Network Loss Landscapes: An Exact Decomposition of the Curvature Exponent

arXiv cs.LG ↗ · 2026-06-03 Cached

This paper presents an exact decomposition of the curvature exponent α in neural network loss landscapes, explaining why it varies across layer types. It introduces the spectral alignment decomposition and derives a spectral transfer identity linking curvature, gradient rank decay, and Hessian exponents, validated across architectures and datasets.

0 favorites 0 likes

#theory

Balancing Learning Rates Across Layers: Exact Two-Step Dynamics and Optimal Scaling in Linear Neural Networks

arXiv cs.LG ↗ · 2026-06-02 Cached

This paper derives exact closed-form expressions for gradients and test loss after one and two steps of gradient descent in two-layer and three-layer linear neural networks, characterizing optimal learning rate selection and revealing a distinct early-training regime where unequal layer-wise learning rates are initially optimal.

0 favorites 0 likes

#theory

@ChrisGPotts: We take for granted that larger models are better than smaller ones, but why is this so? Our new paper, led by Jing Hua…

X AI KOLs Following ↗ · 2026-06-01 Cached

This paper investigates why larger models outperform smaller ones, attributing it to data-induced competition for neural resources through formal analysis and experiments.

0 favorites 0 likes

#theory

@MatthieuWyart: LLMs learn by predicting tokens. World models (JEPA, data2vec) learn by predicting their own abstractions. Which needs …

X AI KOLs Timeline ↗ · 2026-06-01 Cached

This paper proves that learning by predicting latent representations (as in world models like JEPA and data2vec) requires exponentially less data than predicting tokens (as in LLMs) for hierarchical data with hidden structure.

0 favorites 0 likes

#theory

The Hamilton-Jacobi Theory of Deep Learning

arXiv cs.LG ↗ · 2026-05-29 Cached

This paper establishes an exact correspondence between neural network training and Hamilton-Jacobi initial-value problems, unifying deep learning architectures through a deformation parameter.

0 favorites 0 likes

theory

Submit Feedback