Tag
Introduces ToolBench-X, a benchmark for evaluating large language model agents under various tool-environment reliability hazards, revealing a substantial gap in performance compared to clean environments.