document-editing

#document-editing

LLMs 在委托任务中破坏您的文档

arXiv cs.CL ↗ · 2026-04-20 缓存

DELEGATE-52 是一个新的基准测试，揭示了包括 GPT-5.4 和 Claude 4.6 Opus 等前沿模型在内的当前 LLMs，在跨越 52 个专业领域的长期委托工作流中平均损坏 25% 的文档内容。该研究表明 LLMs 会引入稀疏但严重的错误，这些错误在交互中不断复合，引发了人们对其在委托工作范式中可靠性的担忧。

0 人收藏 0 人点赞

document-editing

LLMs 在委托任务中破坏您的文档

提交意见反馈