Tag
DR³-Eval is a benchmark for evaluating deep research agents on multimodal, multi-file report generation with a realistic web environment simulation and comprehensive evaluation framework measuring information recall, factual accuracy, citation coverage, instruction following, and depth quality.