daic

#daic

A Multi-Probe Audit of Clinical-Interview Depression Detection Benchmarks

arXiv cs.CL ↗ · 2026-05-26 Cached

This paper audits benchmark evaluation in clinical-interview depression detection through four complementary probes across five datasets, finding that standard evaluation protocols may overestimate model performance and that leaderboard rankings lack stability.

0 favorites 0 likes

daic

A Multi-Probe Audit of Clinical-Interview Depression Detection Benchmarks

Submit Feedback