diagnostic-dialogue

Tag

Cards List
#diagnostic-dialogue

DiagFlowBench: Evaluating How Language Models Handle Off-Procedure Inputs in Grounded Diagnostic Dialogue

arXiv cs.AI · 2026-06-17 Cached

This paper introduces DiagFlowBench, a benchmark dataset of 1,676 multi-turn diagnostic conversations derived from industrial flowcharts, designed to evaluate how well language models handle off-procedure inputs and abstain from giving inappropriate advice.

0 favorites 0 likes
← Back to home

Submit Feedback