benchmark-construction

Tag

Cards List
#benchmark-construction

Teaching AI Through Benchmark Construction: QuestBench as a Course-Based Practice for Accountable Knowledge Work

arXiv cs.AI · 2026-05-22 Cached

This paper presents QuestBench, a benchmark built by students to evaluate deep research systems across humanities and social science domains. Results show that even advanced systems like GPT-5.5 pass only 57.58% of questions, highlighting failures in trustworthiness.

0 favorites 0 likes
← Back to home

Submit Feedback