model-robustness

#model-robustness

ERRORQUAKE: Heavy-Tailed Error Severity Distributions in Open-Weight Large Language Models

arXiv cs.LG ↗ · 2026-06-05 Cached

The paper introduces Errorquake-10k, a benchmark for evaluating error severity in open-weight LLMs, showing that models with matched accuracy can have vastly different error severity distributions, and argues that severity should be reported alongside accuracy.

0 favorites 0 likes

#model-robustness

@charles_irl: my gut says that to solve float numerics problems from nondeterminism x nonassociativity, we need to think bigger than …

X AI KOLs Following ↗ · 2026-05-22 Cached

This tweet discusses the idea of training models with 'implementation noise' to improve robustness against float numerics problems caused by nondeterminism and nonassociativity.

0 favorites 0 likes

#model-robustness

Alignment

Anthropic Research ↗ · 2026-05-08 Cached

This article outlines the mission and research focus of Anthropic's Alignment team, which develops safeguards to ensure future AI systems remain helpful, honest, and harmless through evaluation, oversight, and stress-testing.

0 favorites 0 likes

model-robustness

ERRORQUAKE: Heavy-Tailed Error Severity Distributions in Open-Weight Large Language Models

@charles_irl: my gut says that to solve float numerics problems from nondeterminism x nonassociativity, we need to think bigger than …

Alignment

Submit Feedback