error-severity

Tag

Cards List
#error-severity

ERRORQUAKE: Heavy-Tailed Error Severity Distributions in Open-Weight Large Language Models

arXiv cs.LG · 2026-06-05 Cached

The paper introduces Errorquake-10k, a benchmark for evaluating error severity in open-weight LLMs, showing that models with matched accuracy can have vastly different error severity distributions, and argues that severity should be reported alongside accuracy.

0 favorites 0 likes
← Back to home

Submit Feedback