document-editing

Tag

Cards List
#document-editing

LLMs Corrupt Your Documents When You Delegate

arXiv cs.CL · 2026-04-20 Cached

DELEGATE-52 is a new benchmark revealing that current LLMs, including frontier models like GPT-5.4 and Claude 4.6 Opus, corrupt an average of 25% of document content during long delegated workflows across 52 professional domains. The research demonstrates that LLMs introduce sparse but severe errors that compound over interactions, raising concerns about their reliability for delegated work paradigms.

0 favorites 0 likes
← Back to home

Submit Feedback