Tag
The paper introduces Errorquake-10k, a benchmark for evaluating error severity in open-weight LLMs, showing that models with matched accuracy can have vastly different error severity distributions, and argues that severity should be reported alongside accuracy.
This tweet discusses the idea of training models with 'implementation noise' to improve robustness against float numerics problems caused by nondeterminism and nonassociativity.
This article outlines the mission and research focus of Anthropic's Alignment team, which develops safeguards to ensure future AI systems remain helpful, honest, and harmless through evaluation, oversight, and stress-testing.