ncre

#ncre

Mind the Gap: Can Frontier LLMs Pass a Standardized Office Proficiency Exam?

arXiv cs.AI ↗ · 2026-06-10 Cached

This paper introduces OfficeEval, a benchmark based on China's National Computer Rank Examination (NCRE) to evaluate LLM agents on complex Office automation tasks. Frontier models achieve at best 36.6% in single-turn and 68.8% with agentic systems, far below human-level performance.

0 favorites 0 likes

ncre

Mind the Gap: Can Frontier LLMs Pass a Standardized Office Proficiency Exam?

Submit Feedback