Tag
This paper introduces the Markov decision contest, a new problem model for reinforcement learning with pairwise preferences. It proves optimality guarantees for stationary policies, exact solvability in P, and presents a learning-efficient approximate algorithm.
This paper presents the first systematic study of uncertainty quantification (UQ) for Large Language Diffusion Models (LLDMs), proposing lightweight zero-shot uncertainty signals derived from the iterative denoising process and showing that LLDMs can achieve both fast inference and reliable hallucination detection with up to 100x lower computational overhead compared to sampling-based baselines.
This paper establishes the first population risk bounds for Kolmogorov-Arnold Networks trained with mini-batch SGD and DP-SGD using correlated noise, advancing theoretical understanding of KANs in privacy-sensitive domains.