evidence-weighting

#evidence-weighting

When Symptoms Are Not Enough: Evidence-Weighting Patterns in Large Language Model Psychiatric Screening

arXiv cs.CL ↗ · 2026-05-25 Cached

This paper introduces a SCID-anchored benchmark of 555 interviews to evaluate five LLMs for psychiatric screening, finding that while models show potential, they tend to discount symptom evidence in the presence of preserved functioning or protective context, requiring careful validation.

0 favorites 0 likes

evidence-weighting

When Symptoms Are Not Enough: Evidence-Weighting Patterns in Large Language Model Psychiatric Screening

Submit Feedback