Tag
BAGEL is a new benchmark for evaluating animal-related knowledge in large language models, constructed from diverse scientific sources and covering taxonomy, morphology, habitat, behavior, and species interactions through closed-book question-answer pairs. The benchmark enables fine-grained analysis across taxonomic groups and knowledge categories, providing insights into model strengths and failure modes for biodiversity applications.