theoretical-analysis

#theoretical-analysis

Support Before Frequency in Discrete Diffusion

arXiv cs.LG ↗ · 2026-05-15 Cached

This paper proposes the 'support-before-frequency' hypothesis for discrete diffusion models, suggesting that models first learn the support (admissible sequences) before refining frequencies within the support. Theoretical analysis of small-noise reverse kernels and experiments on masked language diffusion models support this claim.

0 favorites 0 likes

#theoretical-analysis

RoPE Distinguishes Neither Positions Nor Tokens in Long Contexts, Provably

Hugging Face Daily Papers ↗ · 2026-05-15 Cached

This paper proves that RoPE-based attention fails to distinguish token positions and identity in long contexts, explaining LLM failures within advertised context lengths. Experimental verification shows models optimized for retrieval struggle on simple list tasks.

0 favorites 0 likes

#theoretical-analysis

COSMOS: Model-Agnostic Personalized Federated Learning with Clustered Server Models and Pseudo-Label-Only Communication

arXiv cs.LG ↗ · 2026-05-13 Cached

This paper introduces COSMOS, a model-agnostic personalized federated learning framework that uses clustered server models and pseudo-label-only communication. It provides theoretical analysis showing exponential personalization risk contraction and demonstrates superior performance over existing baselines in heterogeneous environments.

0 favorites 0 likes

#theoretical-analysis

Why DDIM Hallucinates More than DDPM: A Theoretical Analysis of Reverse Dynamics

arXiv cs.LG ↗ · 2026-05-11 Cached

This paper provides a theoretical analysis explaining why deterministic DDIM samplers hallucinate more than stochastic DDPM samplers in diffusion models, attributing it to getting stuck in mode-interpolation regions during reverse dynamics.

0 favorites 0 likes

#theoretical-analysis

On Training in Imagination

arXiv cs.LG ↗ · 2026-05-11 Cached

This paper analyzes the 'training in imagination' paradigm in model-based reinforcement learning, deriving optimal sample allocation strategies and characterizing how dynamics and reward model errors affect policy returns.

0 favorites 0 likes

#theoretical-analysis

On the Role of Strain and Vorticity in Numerical Integration Error for Flow Matching

arXiv cs.LG ↗ · 2026-05-11 Cached

This paper analyzes numerical integration errors in Flow Matching by decomposing the velocity Jacobian into strain and vorticity, proving that strain drives exponential error growth while vorticity contributes linearly. The authors propose a weighted Jacobian regularizer emphasizing strain suppression, which reduces integration error and improves FID on CIFAR-10.

0 favorites 0 likes

#theoretical-analysis

A connection between generative adversarial networks, inverse reinforcement learning, and energy-based models

OpenAI Blog ↗ · 2016-11-11 Cached

This paper establishes mathematical equivalences between generative adversarial networks (GANs), inverse reinforcement learning (IRL), and energy-based models (EBMs), demonstrating that certain IRL methods are equivalent to GANs with evaluable generator density. The work bridges three research communities to enable knowledge transfer for developing more stable and scalable algorithms.

0 favorites 0 likes

theoretical-analysis

Support Before Frequency in Discrete Diffusion

RoPE Distinguishes Neither Positions Nor Tokens in Long Contexts, Provably

COSMOS: Model-Agnostic Personalized Federated Learning with Clustered Server Models and Pseudo-Label-Only Communication

Why DDIM Hallucinates More than DDPM: A Theoretical Analysis of Reverse Dynamics

On Training in Imagination

On the Role of Strain and Vorticity in Numerical Integration Error for Flow Matching

A connection between generative adversarial networks, inverse reinforcement learning, and energy-based models

Submit Feedback