Tag
Proposes an information-theoretic framework for optimizing classifier-free guidance schedules in diffusion models, achieving improved trade-offs between condition consistency and sample diversity on ImageNet and COCO benchmarks.
This paper proposes a thermodynamic measure of intelligence, defining intelligence as the ability to make rare but valid futures more likely. It introduces a metric called 'rare-valid lift' that quantifies how much more often a system produces unlikely but acceptable outcomes compared to a passive baseline.
Researchers at Binghamton University used Shannon entropy to develop a mathematical method that solves Wordle puzzles with a 99% success rate, prioritizing informative guesses over likely answers.
A comprehensive book explaining data compression techniques including information theory, coding methods, modeling, and transforms, targeting programmers with math skills.
This article draws parallels between biological evolution and technological evolution, explaining how modularity and sexual reproduction allow populations to increase the rate of information acquisition. Simulations demonstrate that mixing genetic material accelerates the spread of beneficial mutations, analogous to how technologies build on existing components.
This paper presents an information-theoretic analysis of multimodal learning, revealing the need to capture sample-specific interactions, and proposes DMIL, a paradigm that explicitly models and learns from these interactions via variational decomposition and fine-tuning, achieving superior performance.
This paper develops a geometric framework to measure semantic content of texts using sentence embeddings, proposing a three-coordinate semantic profile (novelty, breadth, integration) and a scalar trade-off triangle, validated across synthetic categories and novels.
This book presents a mathematical theory of deep representation learning, aiming to demystify the internal mechanisms of large deep networks using optimization and information theory, making architecture design a matter of linear algebra and calculus.
InfoShield introduces a privacy-preserving method for speech representations in mental health screening using information-theoretic optimization, reducing sensitive attribute inference while maintaining diagnostic accuracy. A novel TimeAwareMINE estimator addresses temporal-static misalignment in sequential speech.
This paper develops a measure-theoretic framework analyzing when contrastive learning recovers meaningful latent geometry, introducing a 'diversity condition' on positive-pair sampling and a support-corrected InfoNCE variant, with experiments validating that sampling diversity and architectural inductive bias interact critically in contrastive representation learning.
This paper formalizes the concept of Bayes-sufficient representations in supervised learning, defining when a representation retains exactly the information needed for Bayes-optimal prediction under a given loss function. It introduces the Bayes quotient as a canonical loss-dependent object and connects the framework to property elicitation, illustrating distinctions between sufficiency, minimality, and excess retained information through experiments.
This paper presents diffusion models as part of a family of techniques that withhold information and train models to guess it, arguing that diffusion's destroying approach is flexible and advantageous, especially in data-scarce settings; it also discusses exploration problems and introduces a novel kind of probabilistic graphical model.
InfoQuant introduces a train-free method, Peak Suppression Orthogonal Transformation (PSOT), to reshape activation distributions for low-bit LLM quantization, preserving 97% floating-point accuracy under W4A4KV4 and outperforming prior PTQ methods.
This paper proposes Human-Centered Learning Mechanics (HCLM), a dynamical and information-theoretic framework for studying open and controlled learning systems. It formalizes entropy regularization through effective information force, derives convergence and generalization results, and provides a conditional interpretation of scaling-law behavior.
The paper proposes a Shannon Scaling Law that models LLM training as information transmission over a noisy channel, explaining non-monotonic performance phenomena like catastrophic overtraining and quantization-induced degradation, and demonstrating superior predictive accuracy over traditional scaling laws.
This paper challenges the assumption that current Vision-Language Models faithfully synthesize multimodal data, proposing an information-theoretic Modality Translation Protocol with new metrics (Toll, Curse, Fallacy of Seeing) to evaluate trustworthiness over traditional multimodal gain.
A new paper applies Partial Information Decomposition and Time-Delayed Mutual Information to multi-agent LLM systems, demonstrating that relational information between agents is measurable and that genuine coordination requires both differentiation and shared purpose, echoing findings from organizational psychology.
This paper provides an information-theoretic account of when synthetic data improves or degrades LLM training, distinguishing between information-open and information-closed generation loops and explaining collapse via the data processing inequality.
This paper proposes a unified theoretical framework for phase transitions in deep learning (grokking, emergent capabilities) and non-equilibrium chemistry, describing both as driven informational systems governed by two gradient fields.
This paper derives tight theoretical bounds for human-AI teams, proving when confidence-based aggregation leads to complementarity and establishing impossibility results under specific error correlations.