Tag
This paper investigates conditions for algorithmic machine learning (e.g., kNN, random forest) to achieve design-unbiased prediction and classification for finite populations, using probability sampling designs rather than assumed data models. It extends design-based inference from survey sampling to ML algorithms.
This paper argues for a sequential inference framework to enhance LLM trustworthiness by modeling interactions as dependent stochastic processes, ensuring validity under repeated use, and enabling online monitoring for behavioral shifts.
This paper identifies vulnerabilities in the AIVAT variance reduction technique when the heuristic value function is not fixed prior to evaluation, and shows how to propagate heuristic uncertainty to further reduce variance, achieving a 43% reduction in the number of samples needed for statistical conclusions.
This paper analyzes KV cache quantization schemes inspired by TurboQuant, using statistical inference and a new 6D error framework to evaluate quality measures like KL divergence and geometric error.
This paper introduces a statistical framework for adaptively auditing AI systems using Safe Anytime-Valid Inference (SAVI) to draw rigorous conclusions with limited data. It proposes a 'testing by betting' approach to validate model robustness while controlling type-I errors during adaptive sampling.