Tag
This paper presents a corpus-centric diagnostic framework for analyzing biomedical NER and EL benchmarks, revealing substantial differences across nine corpora and arguing that standard statistics are insufficient for characterizing evaluation demands.