statistical-analysis

Tag

Cards List
#statistical-analysis

Open-source LLM benchmark runs 147 coding tasks every 4 hours, 5-trial median with 95% CI, and uses CUSUM for change-point detection. Curious what people think of the methodology

Reddit r/AI_Agents · 2026-06-18

An open-source LLM benchmark with 147 coding tasks runs every 4 hours, using 5-trial median with 95% confidence intervals and CUSUM for change-point detection, sparking discussion on its methodology.

0 favorites 0 likes
#statistical-analysis

The FID Lottery: Quantifying Hidden Randomness in Generative-Model Evaluation

Hugging Face Daily Papers · 2026-06-18 Cached

This paper analyzes the variance of FID scores across different training and sampling seeds, revealing significant reproducibility issues in image generation evaluation. It proposes a new evaluation protocol with error bars and per-cell optimal guidance tuning.

0 favorites 0 likes
#statistical-analysis

The Scaling Law of Evaluation Failure: Why Simple Averaging Collapses Under Data Sparsity and Item Difficulty Gaps, and How Item Response Theory Recovers Ground Truth Across Domains

arXiv cs.LG · 2026-05-13 Cached

This paper argues that simple averaging in AI benchmarks fails under data sparsity and difficulty heterogeneity, proposing Item Response Theory (IRT) as a robust alternative to recover ground truth rankings.

0 favorites 0 likes
#statistical-analysis

First per-image PCA decomposition of Kodak suite reveals deliberate curation

Hacker News Top · 2026-04-20 Cached

First per-image PCA decomposition of the 24-image Kodak PCD0992 suite reveals deliberate curation spanning two orders of magnitude in inter-channel redundancy.

0 favorites 0 likes
#statistical-analysis

Universal statistical signatures of evolution in artificial intelligence architectures

Hugging Face Daily Papers · 2026-04-12 Cached

This paper analyzes 935 ablation experiments from 161 publications to show that AI architectural evolution follows the same statistical laws as biological evolution, including heavy-tailed fitness effect distributions and punctuated equilibria dynamics. The findings suggest that evolutionary statistical structure is substrate-independent, determined by fitness landscape topology rather than the mechanism of selection.

0 favorites 0 likes
← Back to home

Submit Feedback