Tag
An analysis of rsync release history examines whether Claude-assisted commits introduced more bugs, using a permutation test on bugs per 10 commits. The findings suggest no statistically significant increase in bugs for Claude-assisted releases compared to historical distribution.
This paper introduces a distributional generalization of matrix completion where each entry is a probability distribution rather than a scalar, using kernel mean embeddings and Tucker rank to capture low-rank structure. The authors propose a novel estimator with non-asymptotic error bounds and demonstrate effectiveness on synthetic and real-world data.
This paper formalizes pairwise reference alignment as a model-level ordinal observable, defining a statistic to measure agreement between a model's scoring and a reference preference distribution, with finite-sample estimators and an empirical study on Qwen2.5 models and RewardBench.
OpenAI has hired statistician Weijie Su, a top graduate from Peking University and winner of the 'Nobel Prize of Statistics', who will train AI models while on leave from Wharton.
Explains the Student's t-distribution correction for small sample confidence intervals, providing a memorizable table for 90% intervals and a rule-of-thumb for estimating standard deviation from two samples.