hallucination-control

Tag

Cards List
#hallucination-control

DR^{3}-Eval: Towards Realistic and Reproducible Deep Research Evaluation

Hugging Face Daily Papers · 2026-04-16 Cached

DR³-Eval is a benchmark for evaluating deep research agents on multimodal, multi-file report generation with a realistic web environment simulation and comprehensive evaluation framework measuring information recall, factual accuracy, citation coverage, instruction following, and depth quality.

0 favorites 0 likes
← Back to home

Submit Feedback