Aligning Human-AI-Interaction Trust for Mental Health Support: Survey and Position for Multi-Stakeholders

arXiv cs.CL Papers

Summary

A multi-institution survey proposes a three-layer trust framework to align technical, clinical, and human-centered requirements for trustworthy AI in mental-health support.

arXiv:2604.20166v1 Announce Type: new Abstract: Building trustworthy AI systems for mental health support is a shared priority across stakeholders from multiple disciplines. However, "trustworthy" remains loosely defined and inconsistently operationalized. AI research often focuses on technical criteria (e.g., robustness, explainability, and safety), while therapeutic practitioners emphasize therapeutic fidelity (e.g., appropriateness, empathy, and long-term user outcomes). To bridge the fragmented landscape, we propose a three-layer trust framework, covering human-oriented, AI-oriented, and interaction-oriented trust, integrating the viewpoints of key stakeholders (e.g., practitioners, researchers, regulators). Using this framework, we systematically review existing AI-driven research in mental health domain and examine evaluation practices for ``trustworthy'' ranging from automatic metrics to clinically validated approaches. We highlight critical gaps between what NLP currently measures and what real-world mental health contexts require, and outline a research agenda for building socio-technically aligned and genuinely trustworthy AI for mental health support.
Original Article
View Cached Full Text

Cached at: 04/23/26, 10:03 AM

# Aligning Human-AI-Interaction Trust for Mental Health Support: Survey and Position for Multi-Stakeholders
Source: [https://arxiv.org/html/2604.20166](https://arxiv.org/html/2604.20166)
Xin Sun1,3,Yue Su2,Yifan Mo2,Qingyu Meng2,Yuxuan Li2,Saku Sugawara1,Mengyuan Zhang2, Charlotte Gerritsen2,Sander L\. Koole2,Koen Hindriks2,Jiahuan Pei2, 1National Institute of Informatics \(NII\), Japan 2Vrije Universiteit Amsterdam, the Netherlands 3University of Amsterdam, the Netherlands

###### Abstract

Building trustworthy AI systems for mental health support is a shared priority across stakeholders from multiple disciplines\. However, “trustworthy” remains loosely defined and inconsistently operationalized\. AI research often focuses on technical criteria \(e\.g\., robustness, explainability, and safety\), while therapeutic practitioners emphasize therapeutic fidelity \(e\.g\., appropriateness, empathy, and long\-term user outcomes\)\. To bridge the fragmented landscape, we propose a three\-layer trust framework, covering human\-oriented, AI\-oriented, and interaction\-oriented trust, integrating the viewpoints of key stakeholders \(e\.g\., practitioners, researchers, regulators\)\. Using this framework, we systematically review existing AI\-driven research in mental health domain and examine evaluation practices for “trustworthy” ranging from automatic metrics to clinically validated approaches\. We highlight critical gaps between what NLP currently measures and what real\-world mental health contexts require, and outline a research agenda for building socio\-technically aligned and genuinely trustworthy AI for mental health support\.

Aligning Human\-AI\-Interaction Trust for Mental Health Support: Survey and Position for Multi\-Stakeholders

Xin Sun1,3, Yue Su2, Yifan Mo2, Qingyu Meng2, Yuxuan Li2, Saku Sugawara1, Mengyuan Zhang2,Charlotte Gerritsen2,Sander L\. Koole2,Koen Hindriks2,Jiahuan Pei2,1National Institute of Informatics \(NII\), Japan2Vrije Universiteit Amsterdam, the Netherlands3University of Amsterdam, the Netherlands

## 1Introduction

![Refer to caption](https://arxiv.org/html/2604.20166v1/Figures/Broad_Discipline12.png)Figure 1:Discipline network of 1,706 surveyed papers \(2021\-2025\) related to trust spans multiple disciplines but remains fragmented\. Node size indicates literature volume; edge thickness reflects connection strength\. This motivates our stakeholder\-driven trust framework in[Figure 2](https://arxiv.org/html/2604.20166#S2.F2)\. Visualization source code is released\.22footnotemark:2Building trustworthyartificial intelligence\(AI\) systems for mental health support draws on research from multiple disciplines, as reflected in the fragmented discipline network revealed by our literature review \([footnote 2](https://arxiv.org/html/2604.20166#footnote2)\)\. WithinAIresearch, recent advances inlarge language models\(LLMs\) have expanded mental health support applicationsNa et al\. \([2025](https://arxiv.org/html/2604.20166#bib.bib48)\), alongside emerging work on trustworthiness in multi\-large language model\(LLM\) agent systemsOzgun et al\. \([2025](https://arxiv.org/html/2604.20166#bib.bib52)\)\. In contrast, psychotherapy research conceptualizes human trust as a clinically sensitive, context\-dependent, and relational constructChi et al\. \([2021](https://arxiv.org/html/2604.20166#bib.bib16)\); Kauttonen et al\. \([2025](https://arxiv.org/html/2604.20166#bib.bib30)\); Gille et al\. \([2025](https://arxiv.org/html/2604.20166#bib.bib23)\); Rai et al\. \([2025](https://arxiv.org/html/2604.20166#bib.bib58)\)\. This divergence in howtrust and trustworthinessare framed underscores the need for cross\-disciplinary frameworks that align technical, clinical, and human\-centered requirements for trustworthy mental healthAI\.

However, the meaning of the concept “trustworthy AI” remains deeply fragmented: Computer scientists and engineers typically operationalize trustworthiness through technical criteriaHuang et al\. \([2024](https://arxiv.org/html/2604.20166#bib.bib27)\), such as robustness, privacy, and toxicity control, and evaluated through generic metrics \(e\.g\., perplexity, BLEU\) or safety classifier scoresYu et al\. \([2025](https://arxiv.org/html/2604.20166#bib.bib80)\)\. In contrast, mental health practitioners and behavioral scientists focus on clinical and relational criteriaLiu and Tao \([2022](https://arxiv.org/html/2604.20166#bib.bib39)\): therapeutic adherence, empathy, crisis safety, and effects on help\-seeking and well\-being\. Platform providers, regulators, and ethicistsAIHLEG \([2019](https://arxiv.org/html/2604.20166#bib.bib1)\)introduce further expectations around accountability and ethical concerns\. As a result, the concept is applied to fundamentally different scopes, methods, and evaluation practices across communities\.

The most relevant surveys examine trustworthy AI in \(mental\) health domains\.Zhu et al\. \([2025](https://arxiv.org/html/2604.20166#bib.bib86)\)focus on detecting, evaluating, and mitigating medical hallucinations in their study of trust AI doctors\.LLMsin psychotherapyNa et al\. \([2025](https://arxiv.org/html/2604.20166#bib.bib48)\)introduces a conceptual taxonomy of interconnected stages: assessment, diagnosis, and treatment\. However, the treatment stage is task\- and technique\-oriented, overlooking trustworthiness concerns and offering little guidance on cross\-stakeholder collaboration \(e\.g\.,HCIand safety researchers\)\.

In this survey, we fill this gap by proposing a “Three\-Layer Trust Framework” for AI\-basedmental health support\(MHS\) systems\. The framework distinguishes between human\-oriented trust \(users’ perceptions and effects of these systems\), AI\-oriented trustworthiness \(criteria of the models and systems\), and interaction\-oriented trustworthiness \(criteria emerging during interaction\)\. We map each layer to the perspectives of stakeholders, among mental health practitioners, regulators, AI, HCI, and safety researchers\. We use this multi\-layer, multi\-stakeholder lens to organize and analyze the literature with the scope oftrustworthy AI for mental health\. We conclude by outlining open challenges and a research agenda towards socio\-technical alignment in trustworthy AI\-based mental health support\. Details of the literature review methodology are provided in[Appendix B](https://arxiv.org/html/2604.20166#A2)\.

The contributions of this survey are threefold\.

- •We articulate a three\-layer framework unifying fragmented notions of trustworthy AI in mental health into a stakeholder\-aware perspective\.
- •We systematically review AI\-based mental health literature through this framework, highlighting which layers and stakeholders are emphasized and where critical gaps arise\.
- •We analyze methods and evaluation across layers and stakeholders, positioning the gap between what AI research defines as trustworthiness and how it supports trust in therapeutic practices\.

## 2Conceptual Framework

Trustworthy AI for mental health is not a monolithic concept, but with concerns raised across multiple disciplines\. We distill core trustworthinesscriteria\(e\.g\., transparency, explainability, privacy\) into a three\-layer trust framework, and review this landscape of involvedstakeholders\(i\.e\., practitioners, AI/HCI/safety researchers, regulators\)\.

![Refer to caption](https://arxiv.org/html/2604.20166v1/x1.png)Figure 2:Based on a cross\-disciplinary literature review, we identify five key stakeholder groups whose perspectives shape what it means for AI to be trustworthy in mental health contexts\. Synthesizing these perspectives, we propose a three\-layer trust framework: human\-oriented trust, interaction\-oriented trustworthiness, and AI\-oriented trustworthiness, to organize existing methods, evaluation practices, and open challenges\.### 2\.1Stakeholder Landscape

#### 2\.1\.1Psychotherapy Practitioners

In psychotherapy, therapists and researchers study clients’ or users’ trust as a clinically sensitive variable, which can guide AI system design\. The trust, as context\-dependent and relational, emerges from user\-system interactionsChi et al\. \([2021](https://arxiv.org/html/2604.20166#bib.bib16)\); Kauttonen et al\. \([2025](https://arxiv.org/html/2604.20166#bib.bib30)\); Gille et al\. \([2025](https://arxiv.org/html/2604.20166#bib.bib23)\); Rai et al\. \([2025](https://arxiv.org/html/2604.20166#bib.bib58)\)\.

User\-side characteristics shown to influence trust include attitudes toward technology, prior usage patterns, personality traits, familiarity, self\-responsibility attributions, and perceived social supportKauttonen et al\. \([2025](https://arxiv.org/html/2604.20166#bib.bib30)\); Zhao et al\. \([2025](https://arxiv.org/html/2604.20166#bib.bib84)\); Huo et al\. \([2022](https://arxiv.org/html/2604.20166#bib.bib28)\)\. Literacy level also shapes reliance: users with limited AI knowledge often place greater trust in system’s recommendationsWoodcock et al\. \([2021](https://arxiv.org/html/2604.20166#bib.bib75)\)\. Moreover, trust tends to increase when human oversight is present, reflecting a preference for accountability in sensitive mental health settingsMayer et al\. \([2024](https://arxiv.org/html/2604.20166#bib.bib46)\); Aoki \([2021](https://arxiv.org/html/2604.20166#bib.bib5)\)\.

On the system side, two user\-perceivable attributes are consistently highlighted\. Anthropomorphism can improve interaction quality and perceived usefulness, but its effects are nonlinear and highly context dependent, indicating that human\-like cues alone cannot ensure appropriate trustChi et al\. \([2021](https://arxiv.org/html/2604.20166#bib.bib16)\); Wu et al\. \([2023](https://arxiv.org/html/2604.20166#bib.bib76)\); Brunswicker et al\. \([2025](https://arxiv.org/html/2604.20166#bib.bib11)\); Liu and Tao \([2022](https://arxiv.org/html/2604.20166#bib.bib39)\)\. While explainability is treated as essential for trust, its benefits are constrained by privacy concernsLeichtmann et al\. \([2023](https://arxiv.org/html/2604.20166#bib.bib34)\); Aktan et al\. \([2022](https://arxiv.org/html/2604.20166#bib.bib2)\), and overly detailed explanations may reduce usability, calling for selective, user\-centered transparencyGoisauf et al\. \([2025](https://arxiv.org/html/2604.20166#bib.bib24)\)\.

These works emphasize that trust is not static, but evolves as users interact over timeRodriguez Rodriguez et al\. \([2023](https://arxiv.org/html/2604.20166#bib.bib60)\); Lee and Moray \([1992](https://arxiv.org/html/2604.20166#bib.bib33)\); Manzey et al\. \([2012](https://arxiv.org/html/2604.20166#bib.bib45)\)\. From a psychotherapy perspective, this underscores the need to study trust as a process rather than a single, static judgment\.

#### 2\.1\.2HCI Researchers

\\Ac

HCI researchers study trust in mental health systems from interactional perspectives: trust is communicated, interpreted, and negotiated via human\-AI interaction\.

Accordingly,human computer interaction\(HCI\) work emphasizes several interrelated criteria\. A core focus is perceived competence and reliability, whether AI behaves accurately and consistently across turns\(Lee et al\.,[2025](https://arxiv.org/html/2604.20166#bib.bib32); Cao et al\.,[2025](https://arxiv.org/html/2604.20166#bib.bib13); You et al\.,[2025](https://arxiv.org/html/2604.20166#bib.bib78); Zheng et al\.,[2025](https://arxiv.org/html/2604.20166#bib.bib85)\)\. Closely related is conversational safety, which examines how interaction can reduce harmful, misleading, or ethically problematic responses in mental health contexts\(Namvarpour and Razi,[2024](https://arxiv.org/html/2604.20166#bib.bib49); Wang et al\.,[2025b](https://arxiv.org/html/2604.20166#bib.bib71); Ma et al\.,[2024](https://arxiv.org/html/2604.20166#bib.bib44)\)\. Interaction\-level transparency and explainability also matter: researchers design user\-facing features \(e\.g\., capability disclosures, explanations\) to help users understand what systems can and cannot do and to set appropriate expectations\(Cao et al\.,[2025](https://arxiv.org/html/2604.20166#bib.bib13)\)\.

Besides, empathy and engagement are also factors influencing users’ trust and willingness to continue using MHS systems\(Wang et al\.,[2025b](https://arxiv.org/html/2604.20166#bib.bib71); Choi et al\.,[2025](https://arxiv.org/html/2604.20166#bib.bib18)\)\. Controllability is also highlighted as enabling users to guide conversations or override system behavior, which helps preserve user autonomy and mitigate over\-reliance\(Sun et al\.,[2025](https://arxiv.org/html/2604.20166#bib.bib65); Wester et al\.,[2024a](https://arxiv.org/html/2604.20166#bib.bib73); Swinger et al\.,[2025](https://arxiv.org/html/2604.20166#bib.bib66)\)\.

Overall,\\AcHCI perspective treats interaction as a mediating layer: trustworthiness depends not only on what AI can do, but on how its capabilities, limits, and safeguards are surfaced through use in mental health settings\(Thieme et al\.,[2023](https://arxiv.org/html/2604.20166#bib.bib67); Namvarpour and Razi,[2024](https://arxiv.org/html/2604.20166#bib.bib49); Wang et al\.,[2025b](https://arxiv.org/html/2604.20166#bib.bib71)\)\.

#### 2\.1\.3AI Researchers

AI researchers are a key stakeholder in trustworthy AI for mental health\. From AI literature, trustworthiness is primarily grounded in model\- and evaluation\-level criteria that justify trust independently of specific user interface or interaction\.

A key finding is that model behavior can be less stable than offline scores suggest, which is especially concerning in high\-stakes mental health settings\. Even under controlled conditions, LLMs exhibit substantial output variability, motivating multi\-run evaluation\(Lupart et al\.,[2025](https://arxiv.org/html/2604.20166#bib.bib43)\)\. Retrieval components add further fragility: embedding models show writing\-style preferences that can affect rankings and fairness, and embeddings used as evaluation signals can bias RAG, problems amplified when mental health language is indirect or stylistically diverse\(Cao and et al\.,[2025](https://arxiv.org/html/2604.20166#bib.bib14); Liu et al\.,[2024](https://arxiv.org/html/2604.20166#bib.bib40)\)\.

Besides, LLM generation may appear correct while failing to be causally faithful to the evidence actually used\(Wallat et al\.,[2025](https://arxiv.org/html/2604.20166#bib.bib68)\)\. Likewise, LLM\-based evaluators \(“LLM\-as\-a\-Judge”Li et al\. \([2024b](https://arxiv.org/html/2604.20166#bib.bib36)\)\) can introduce bias and show limited sensitivity to subtle yet important differences\(Balog et al\.,[2025](https://arxiv.org/html/2604.20166#bib.bib9)\)\. From the AI researcher’s perspective, trustworthy mental health AI therefore requires robustness, fairness, faithfulness, and evaluation reliability, providing a sound foundation for interaction design and calibrated human trust\.

Three\-Layer Trust FrameworkHuman\-Oriented Trust \(§[3](https://arxiv.org/html/2604.20166#S3)\)Stakeholder:Psychotherapy practitioners & Users \(§[2\.1\.1](https://arxiv.org/html/2604.20166#S2.SS1.SSS1)\);H1: Perceived Trust & MeasuresChi et al\. \([2021](https://arxiv.org/html/2604.20166#bib.bib16)\); Liu and Tao \([2022](https://arxiv.org/html/2604.20166#bib.bib39)\)H2: User Characteristics\(Kauttonen et al\.,[2025](https://arxiv.org/html/2604.20166#bib.bib30); Huo et al\.,[2022](https://arxiv.org/html/2604.20166#bib.bib28)\)H3: System Characteristics\(Wu et al\.,[2023](https://arxiv.org/html/2604.20166#bib.bib76); Leichtmann et al\.,[2023](https://arxiv.org/html/2604.20166#bib.bib34)\)Evaluation & Measures:[Table 3](https://arxiv.org/html/2604.20166#A2.T3)Interaction\-Oriented Trustworthiness \(§[4](https://arxiv.org/html/2604.20166#S4)\)Stakeholder:HCI \(§[2\.1\.2](https://arxiv.org/html/2604.20166#S2.SS1.SSS2)\); Regulators \(§[2\.1\.5](https://arxiv.org/html/2604.20166#S2.SS1.SSS5)\);I1: Competence\(Lee et al\.,[2025](https://arxiv.org/html/2604.20166#bib.bib32); You et al\.,[2025](https://arxiv.org/html/2604.20166#bib.bib78); Zheng et al\.,[2025](https://arxiv.org/html/2604.20166#bib.bib85)\)I2: Communication StyleNamvarpour and Razi \([2024](https://arxiv.org/html/2604.20166#bib.bib49)\); Ma et al\. \([2024](https://arxiv.org/html/2604.20166#bib.bib44)\)I3: TransparencyCao et al\. \([2025](https://arxiv.org/html/2604.20166#bib.bib13)\)I4: Empathy & Engagement\(Wang et al\.,[2025b](https://arxiv.org/html/2604.20166#bib.bib71); Choi et al\.,[2025](https://arxiv.org/html/2604.20166#bib.bib18)\)I5: ControllabilitySwinger et al\. \([2025](https://arxiv.org/html/2604.20166#bib.bib66)\)Method & Evaluation:[Table 4](https://arxiv.org/html/2604.20166#A4.T4)AI\-Oriented Trustworthiness \(§[5](https://arxiv.org/html/2604.20166#S5)\)Stakeholder:AI/Safety/Security \(§[2\.1\.3](https://arxiv.org/html/2604.20166#S2.SS1.SSS3);[2\.1\.4](https://arxiv.org/html/2604.20166#S2.SS1.SSS4)\)A1: Reliability & RobustnessKang et al\. \([2024](https://arxiv.org/html/2604.20166#bib.bib29)\); Dhuliawala et al\. \([2023](https://arxiv.org/html/2604.20166#bib.bib19)\)A2: Safety & Harm Prevention\(Hua et al\.,[2024](https://arxiv.org/html/2604.20166#bib.bib26); Baidal et al\.,[2025](https://arxiv.org/html/2604.20166#bib.bib8)\)A3: PrivacyShin et al\. \([2023](https://arxiv.org/html/2604.20166#bib.bib61)\)A4: ExplainabilityYang et al\. \([2023](https://arxiv.org/html/2604.20166#bib.bib77)\)A5: FairnessGabriel et al\. \([2024](https://arxiv.org/html/2604.20166#bib.bib22)\)Method & Evaluation:[Table 5](https://arxiv.org/html/2604.20166#A4.T5)Figure 3:The proposed Three\-Layer Trust Framework\. Each layer reflects priorities across stakeholders and summarizes key trust criteria with representative literature\. This framework unifies fragmented stakeholder perspectives and clarifies how user trust, interaction\- and system\-level trustworthiness jointly shape trustworthy mental health AI\.
#### 2\.1\.4Safety and Security Researchers

Safety and security researchers examine trustworthy AI through adversarial failure and misuse, focusing on whether MHS systems remain safe under worst\-case conditions\(Liu et al\.,[2023](https://arxiv.org/html/2604.20166#bib.bib41)\)\.

Prior work identifies multiple vulnerabilities in mental support AI, including prompt injection and jailbreak attacks\(Yu et al\.,[2025](https://arxiv.org/html/2604.20166#bib.bib80)\), backdoor triggers\(Hua et al\.,[2024](https://arxiv.org/html/2604.20166#bib.bib26)\), and membership inference that can expose sensitive training data\(Wang et al\.,[2023](https://arxiv.org/html/2604.20166#bib.bib69)\)\. These risks are especially severe in mental health settings, where users frequently disclose trauma, suicidal ideation, and other highly sensitive information\(Baidal et al\.,[2025](https://arxiv.org/html/2604.20166#bib.bib8); Cho et al\.,[2023](https://arxiv.org/html/2604.20166#bib.bib17)\)\.

Accordingly, this stakeholder group prioritizes three trustworthiness goals: preventing privacy leakage through direct extraction and memorization\(Kwesi et al\.,[2025](https://arxiv.org/html/2604.20166#bib.bib31); Shin et al\.,[2023](https://arxiv.org/html/2604.20166#bib.bib61)\); improving adversarial robustness against inputs designed to bypass systems\(Baidal et al\.,[2025](https://arxiv.org/html/2604.20166#bib.bib8); Wang et al\.,[2023](https://arxiv.org/html/2604.20166#bib.bib69)\); and deploying system\-level safeguards such as input filtering and escalation protocols\(Hua et al\.,[2024](https://arxiv.org/html/2604.20166#bib.bib26); Ozgun et al\.,[2025](https://arxiv.org/html/2604.20166#bib.bib52); Na et al\.,[2025](https://arxiv.org/html/2604.20166#bib.bib48)\)\. Evaluation in this line of work typically relies on red\-teaming, jailbreak audits, and penetration testing, rather than standard offline benchmarks\(Zhu et al\.,[2025](https://arxiv.org/html/2604.20166#bib.bib86); Alghamdi et al\.,[2025](https://arxiv.org/html/2604.20166#bib.bib3)\)\.

#### 2\.1\.5Regulators and Standards Associations

Regulators and standards associations set the minimum acceptable conditions for deploying mental health AI by translating ethical principles into enforceable requirements\. We review six foundational frameworks from U\.S\. agencies\(FDA,[2021](https://arxiv.org/html/2604.20166#bib.bib21); NIST,[2023](https://arxiv.org/html/2604.20166#bib.bib51); AMA,[2024](https://arxiv.org/html/2604.20166#bib.bib4)\), EU bodies\(EPEUCO,[2024](https://arxiv.org/html/2604.20166#bib.bib20); AIHLEG,[2019](https://arxiv.org/html/2604.20166#bib.bib1)\), professional ethics\(APA,[2025](https://arxiv.org/html/2604.20166#bib.bib6)\), and academic guidance\(Pillay,[2025](https://arxiv.org/html/2604.20166#bib.bib54)\)\(Appendix §[C](https://arxiv.org/html/2604.20166#A3),[Table 2](https://arxiv.org/html/2604.20166#A2.T2)\)\. Despite differences in scope, these sources converge on human\-centricity and risk mitigation in high\-stakes settings\.

Across documents,autonomyandinformed consentrequire transparent disclosure of AI involvement, system limitations, risks, and users’ rights to opt out\.Beneficenceandnon\-maleficencerestrict AI to a complementary role, emphasizing reliability, clinical validity, and alignment with therapeutic goals rather than autonomous decision\-making\.Privacy,confidentiality, andsecurityimpose strict constraints on data handling, model development, and third\-party risk management\. Lastly, principles ofjustice,fairness, andinclusivenessrequire bias auditing, whilefidelity,professional integrity, andaccountabilitydemand clinician competence and sustained human oversight, as formalized and enforced by regulatory and ethics bodies\.

### 2\.2Three\-Layer Trust Framework

Trust in mental health AI does not come from a single source\. It is built through a chain of expectations, how the system works internally, how it behaves during interaction, and how users ultimately experience its support\. To organize these views, we synthesize the following three\-layer trust framework \(as shown in[Figure 2](https://arxiv.org/html/2604.20166#S2.F2)\)\.

##### Human\-oriented trust\.

Even if a system is technically robust and interactively competent, trust ultimately depends on whether users feel trustworthy\. Thus, human\-oriented trust captures users’ subjective trust responses toward mental health AI systems\. It concerns how trust is perceived, calibrated, and updated by users, shaped by individual characteristics, expectations, and prior experiences\. This layer reflects trust as a psychological and relational state, rather than a system property\.

##### Interaction\-oriented trustworthiness\.

Once users begin interacting with systems, trust shifts into a different mode: interaction\-oriented\. It concerns how trust is mediated through interaction \(e\.g\., conversational behaviors\), feedback, transparency cues, controllability, and safety mechanisms\. This layer bridges user perception and system/UI design, emphasizing trust as something enacted and negotiated through interaction\.

##### AI\-oriented trustworthiness\.

It refers to system\-level trustworthiness criteria of mental health AI defined at the model and infrastructure level, including reliability, robustness, fairness, privacy protection, evaluation validity, and avoidance of harmful or misleading outputs\. These criteria are primarily specified and assessed by AI, NLP, and security research before and during deployment, and they constrain the range of behaviors that interaction\-level trust can safely support\. Failures at this layer, such as unsafe or unstable generation, may undermine trust regardless of interaction quality\.

Together, these three layers show that trust in AI\-powered MHS systems is an alignment problem across stakeholders\. Human trust must be grounded in interaction\- and AI\-oriented trustworthiness, while system\-level trust alone is insufficient if interaction cues mislead users\. This layered taxonomy synthesizes stakeholder\-specific perspectives and structures our review of scope, methods, and evaluation practices in the following sections\.

## 3Human\-Oriented Trust

Human\-oriented trust is understood as a subjective psychological state that shapes users’ willingness to adopt, rely on, and continue using\. We organize human\-oriented trust around three questions: what trust is, who trusts, and how trust is perceived\.

### 3\.1What Is Trust

##### Scope\.

This dimension concerns how users judge trust in AI\-powered mental health systems\. At a basic level, trust reflects a user’s confidence in whether an AI system is reliable enough to be relied upon in a sensitive context\. This aligns with trust theories in psychology, such as ABI modelMayer et al\. \([1995](https://arxiv.org/html/2604.20166#bib.bib47)\)and MATCH frameworkLiao and Sundar \([2022](https://arxiv.org/html/2604.20166#bib.bib37)\), which characterize trust as a belief about an agent’s ability and dependability\. Prior work measures trust in two main ways\. Some studies treat trust as a*unidimensional*construct, measuring overall trust as a single judgmentWu et al\. \([2023](https://arxiv.org/html/2604.20166#bib.bib76)\); Woodcock et al\. \([2021](https://arxiv.org/html/2604.20166#bib.bib75)\); Aoki \([2021](https://arxiv.org/html/2604.20166#bib.bib5)\); Kauttonen et al\. \([2025](https://arxiv.org/html/2604.20166#bib.bib30)\); Aktan et al\. \([2022](https://arxiv.org/html/2604.20166#bib.bib2)\); Liu and Tao \([2022](https://arxiv.org/html/2604.20166#bib.bib39)\); Mayer et al\. \([2024](https://arxiv.org/html/2604.20166#bib.bib46)\); Huo et al\. \([2022](https://arxiv.org/html/2604.20166#bib.bib28)\); Youn and Jin \([2021](https://arxiv.org/html/2604.20166#bib.bib79)\); Zhao et al\. \([2025](https://arxiv.org/html/2604.20166#bib.bib84)\)\. Others adopt a*multidimensional*view, decomposing trust into components \(e\.g\., perceived reliability, agency, and competenceChi et al\. \([2021](https://arxiv.org/html/2604.20166#bib.bib16)\); Luetke Lanfer et al\. \([2023](https://arxiv.org/html/2604.20166#bib.bib42)\); Leichtmann et al\. \([2023](https://arxiv.org/html/2604.20166#bib.bib34)\); Gille et al\. \([2025](https://arxiv.org/html/2604.20166#bib.bib23)\); Brunswicker et al\. \([2025](https://arxiv.org/html/2604.20166#bib.bib11)\); Rai et al\. \([2025](https://arxiv.org/html/2604.20166#bib.bib58)\)\)\.

##### Evaluation\.

Trust is primarily measured through subjective assessments\. Studies commonly employ validated scales capturing perceived agencyLuetke Lanfer et al\. \([2023](https://arxiv.org/html/2604.20166#bib.bib42)\), system reliabilityChi et al\. \([2021](https://arxiv.org/html/2604.20166#bib.bib16)\), trust propensityChi et al\. \([2021](https://arxiv.org/html/2604.20166#bib.bib16)\), or overall trustWu et al\. \([2023](https://arxiv.org/html/2604.20166#bib.bib76)\); Woodcock et al\. \([2021](https://arxiv.org/html/2604.20166#bib.bib75)\); Liu and Tao \([2022](https://arxiv.org/html/2604.20166#bib.bib39)\); Kauttonen et al\. \([2025](https://arxiv.org/html/2604.20166#bib.bib30)\); Aoki \([2021](https://arxiv.org/html/2604.20166#bib.bib5)\); Youn and Jin \([2021](https://arxiv.org/html/2604.20166#bib.bib79)\)\. Some work adopts single\-item measures for direct trust assessmentWoodcock et al\. \([2021](https://arxiv.org/html/2604.20166#bib.bib75)\); Aoki \([2021](https://arxiv.org/html/2604.20166#bib.bib5)\); Mayer et al\. \([2024](https://arxiv.org/html/2604.20166#bib.bib46)\)\. Qualitative approaches, including interviews and content analysis, are used to examine how users interpret and justify their trust judgmentsLuetke Lanfer et al\. \([2023](https://arxiv.org/html/2604.20166#bib.bib42)\)\. Beyond self\-report, behavioral indicators such as interaction patterns and decision choices are also used to infer trustLeichtmann et al\. \([2023](https://arxiv.org/html/2604.20166#bib.bib34)\); Brunswicker et al\. \([2025](https://arxiv.org/html/2604.20166#bib.bib11)\); Mayer et al\. \([2024](https://arxiv.org/html/2604.20166#bib.bib46)\); Aktan et al\. \([2022](https://arxiv.org/html/2604.20166#bib.bib2)\)\.

### 3\.2Who Trusts

##### Scope\.

This dimension captures user\-side characteristics\. Prior work identifies attitudes of AI, personality traits, familiarity and perceived social support as important determinants of trust in mental healthKauttonen et al\. \([2025](https://arxiv.org/html/2604.20166#bib.bib30)\); Zhao et al\. \([2025](https://arxiv.org/html/2604.20166#bib.bib84)\); Huo et al\. \([2022](https://arxiv.org/html/2604.20166#bib.bib28)\)\. Literacy level also matters: users with limited AI knowledge often exhibit greater reliance on AIWoodcock et al\. \([2021](https://arxiv.org/html/2604.20166#bib.bib75)\)\. Studies consistently show that human\-in\-the\-loop oversight increases trust in AIMayer et al\. \([2024](https://arxiv.org/html/2604.20166#bib.bib46)\); Aoki \([2021](https://arxiv.org/html/2604.20166#bib.bib5)\); Liu and Tao \([2022](https://arxiv.org/html/2604.20166#bib.bib39)\), reflecting users’ preference for shared control in high\-stakes contexts\.

##### Evaluation\.

User characteristics are frequently measured using self\-developed instruments tailored to demographic, socio\-cultural factorsAktan et al\. \([2022](https://arxiv.org/html/2604.20166#bib.bib2)\); Leichtmann et al\. \([2023](https://arxiv.org/html/2604.20166#bib.bib34)\); Zhao et al\. \([2025](https://arxiv.org/html/2604.20166#bib.bib84)\); Huo et al\. \([2022](https://arxiv.org/html/2604.20166#bib.bib28)\); Woodcock et al\. \([2021](https://arxiv.org/html/2604.20166#bib.bib75)\); Liu and Tao \([2022](https://arxiv.org/html/2604.20166#bib.bib39)\); Kauttonen et al\. \([2025](https://arxiv.org/html/2604.20166#bib.bib30)\), while validated scales are more commonly used for stable traits such as personality or trust propensity\.

### 3\.3How Trust Is Perceived

##### Scope\.

This dimension focuses on the cues users rely on to form trust\. It concerns perceived agent characteristics that are visible to users\. While many system features may influence trust, psychotherapy\-oriented studies place particular emphasis on anthropomorphism and explainability\. Anthropomorphism captures the extent to which users perceive human\-likeness in AI, including emotionality, self\-awareness, and social presenceWang et al\. \([2025b](https://arxiv.org/html/2604.20166#bib.bib71)\); Cao et al\. \([2025](https://arxiv.org/html/2604.20166#bib.bib13)\); Liu and Tao \([2022](https://arxiv.org/html/2604.20166#bib.bib39)\)\. Explainability refers to the system’s ability to communicate the rationale behind its outputs and decisions in user\-understandable waysLeichtmann et al\. \([2023](https://arxiv.org/html/2604.20166#bib.bib34)\)\.

##### Evaluation\.

Anthropomorphism is commonly measured by validated scalesLiu and Tao \([2022](https://arxiv.org/html/2604.20166#bib.bib39)\)\. Prior work distinguishes between superficial forms \(e\.g\., appearance or communication style\) and deeper anthropomorphism involving moral agency and emotional experienceWu et al\. \([2023](https://arxiv.org/html/2604.20166#bib.bib76)\)\. Studies often manipulate empathizing and systemizing behaviors before measuring perceived anthropomorphismBrunswicker et al\. \([2025](https://arxiv.org/html/2604.20166#bib.bib11)\)\. Explainability is typically studied by varying the clarity and form of system explanations and assessing user responsesLeichtmann et al\. \([2023](https://arxiv.org/html/2604.20166#bib.bib34)\); Woodcock et al\. \([2021](https://arxiv.org/html/2604.20166#bib.bib75)\)\. Qualitative methods, such as interviews, are frequently used to examine how users interpret these cuesWang et al\. \([2025b](https://arxiv.org/html/2604.20166#bib.bib71)\); Cao et al\. \([2025](https://arxiv.org/html/2604.20166#bib.bib13)\); Namvarpour and Razi \([2024](https://arxiv.org/html/2604.20166#bib.bib49)\); Choi et al\. \([2025](https://arxiv.org/html/2604.20166#bib.bib18)\)\.

Human\-oriented trust reflects how users perceive trust\. However, these perceptions may diverge from interaction\- and system\-level trustworthiness\.

## 4Interaction\-Oriented Trustworthiness

Interaction\-oriented trustworthiness cares whether systems support appropriate trust during use\. Unlike AI\-oriented trustworthiness, it emerges in real time and is evaluated via observed interactions\.

### 4\.1Competence and Reliability

##### Scope\.

Competence and reliability refer to whether MHS systems consistently provide accurate, contextually appropriate, and therapeutically meaningful responses during interaction\. Prior work operationalizes this across application types: story\-based interventions involve professional review of user\-generated content for accuracySien and McGrenere \([2025](https://arxiv.org/html/2604.20166#bib.bib62)\)\. Chatbots aim to recognize users’ emotions and deliver evidence\-based interventionsWester et al\. \([2024a](https://arxiv.org/html/2604.20166#bib.bib73)\), while LLM agents generate responses aligned with psychological principles while avoiding ineffective or harmful adviceSong et al\. \([2025](https://arxiv.org/html/2604.20166#bib.bib63)\); Sun et al\. \([2025](https://arxiv.org/html/2604.20166#bib.bib65)\)\. Therapist\-training systems assess competence via structured evaluations of treatmentsSwinger et al\. \([2025](https://arxiv.org/html/2604.20166#bib.bib66)\)\.

##### Methods and Evaluation\.

Literature studies competence through multiple approaches\. Qualitative interviews and case analyses examine whetherAIchatbots provide useful and therapeutically relevant responses, while also documenting failures such as overly generic or inaccurate adviceSong et al\. \([2025](https://arxiv.org/html/2604.20166#bib.bib63)\)\. Expertise\-aligned generation ensures LLMs are controlled by expert\-authored therapeutic scripts, improving adherence to therapeutic principlesSun et al\. \([2025](https://arxiv.org/html/2604.20166#bib.bib65)\)\. In contrast, systems for therapist training rely on structured protocol\-based evaluations, such as time\-stamped behavioral analysis and checklist\-style scoring aligned with treatment manualsSwinger et al\. \([2025](https://arxiv.org/html/2604.20166#bib.bib66)\)\.

### 4\.2Conversational Safety and Controllability

##### Scope\.

Conversational safety and controllability address how systems balance harm prevention with user agency during interaction\. Safety mechanisms detect crisis signals, enforce boundaries, and defer to human judgment in high\-risk situationsSong et al\. \([2025](https://arxiv.org/html/2604.20166#bib.bib63)\); Sun et al\. \([2025](https://arxiv.org/html/2604.20166#bib.bib65)\); Swinger et al\. \([2025](https://arxiv.org/html/2604.20166#bib.bib66)\), while controllability ensures users can influence system behavior, such as selecting modules or controlling conversation depth\(Sun et al\.,[2025](https://arxiv.org/html/2604.20166#bib.bib65); Wester et al\.,[2024a](https://arxiv.org/html/2604.20166#bib.bib73); Swinger et al\.,[2025](https://arxiv.org/html/2604.20166#bib.bib66)\)\.

##### Methods and Evaluation\.

To manage risk, studies commonly adopt modular or layered system designs\. High\-risk signals are routed to dedicated crisis modules or flagged for human intervention rather than handled through open\-ended dialogueSun et al\. \([2025](https://arxiv.org/html/2604.20166#bib.bib65)\)\. Autonomy\-in\-the\-Middle architectures further restrict AI behavior to classification and alerting while preserving human authority over decisionsSwinger et al\. \([2025](https://arxiv.org/html/2604.20166#bib.bib66)\)\. Evaluation typically relies on user interviews and expert assessments that examine how systems handle emotionally nuanced and high\-risk scenarios, often contrasting rigid filtering with more open\-ended generative responsesSong et al\. \([2025](https://arxiv.org/html/2604.20166#bib.bib63)\)\.

### 4\.3Empathy and Engagement

##### Scope\.

Empathy and engagement sustain user interaction and emotional disclosure\. Literature identifies key tasks such as generating empathetic utterance, non\-judgmental tone, and balancing therapeutic alliance with therapeutic structure\(Song et al\.,[2025](https://arxiv.org/html/2604.20166#bib.bib63); Li et al\.,[2024a](https://arxiv.org/html/2604.20166#bib.bib35); Sun et al\.,[2025](https://arxiv.org/html/2604.20166#bib.bib65)\)\.

##### Methods and Evaluation\.

Systems operationalize these qualities through script\- or protocol\-aligned response generationSun et al\. \([2025](https://arxiv.org/html/2604.20166#bib.bib65)\), structured interaction designs \(e\.g\., multi\-choice interfaces or curated narratives\)\(Sien and McGrenere,[2025](https://arxiv.org/html/2604.20166#bib.bib62); Wester et al\.,[2024a](https://arxiv.org/html/2604.20166#bib.bib73)\), and expert ratings aligned with clinical competenciesSwinger et al\. \([2025](https://arxiv.org/html/2604.20166#bib.bib66)\)\. Evaluation commonly uses user self\-report measures of engagement or emotional experience\(Sien and McGrenere,[2025](https://arxiv.org/html/2604.20166#bib.bib62); Wester et al\.,[2024a](https://arxiv.org/html/2604.20166#bib.bib73)\), comparisons against rule\-based or expert\-aligned baselinesSun et al\. \([2025](https://arxiv.org/html/2604.20166#bib.bib65)\), and expert coding of behaviors such as empathy and rapport in training contextsSwinger et al\. \([2025](https://arxiv.org/html/2604.20166#bib.bib66)\)\.

### 4\.4Transparency

##### Scope\.

Transparency concerns whether users can understand what the system is doing in interaction, including its limitations, decision boundaries, and the basis of its feedback\. Literature highlights tasks such as clarifying system limitations, exposing decision logic at appropriate levels, and preventing misconceptions about human\-like understanding or authoritySong et al\. \([2025](https://arxiv.org/html/2604.20166#bib.bib63)\); Sun et al\. \([2025](https://arxiv.org/html/2604.20166#bib.bib65)\)\.

##### Methods and Evaluation\.

Transparency is often implemented through explanations, annotations or labels that surface the rationale behind outputsSun et al\. \([2025](https://arxiv.org/html/2604.20166#bib.bib65)\); Swinger et al\. \([2025](https://arxiv.org/html/2604.20166#bib.bib66)\)\. User\-facing artifacts such as journey maps or labeled strategies can improve interpretability in narrative\-based and scripted settingsSien and McGrenere \([2025](https://arxiv.org/html/2604.20166#bib.bib62)\); Sun et al\. \([2025](https://arxiv.org/html/2604.20166#bib.bib65)\)\. Evaluations consider whether transparency improves user comprehension, how it affects trust, and whether it introduces unintended interaction incentives \(e\.g\., users adapting responses to satisfy the system rather than therapeutic goals\)Wester et al\. \([2024a](https://arxiv.org/html/2604.20166#bib.bib73)\); Swinger et al\. \([2025](https://arxiv.org/html/2604.20166#bib.bib66)\)\.

## 5AI\-Oriented Trustworthiness

AI\-oriented trustworthiness concerns whether a mental health AI system operates reliably and safely at the model\- or system\-level\.

### 5\.1Reliability and Robustness

##### Scope\.

Reliability requires consistent model behavior across inputs, which can be seen as the sign of adhering to clinical competenceNguyen et al\. \([2025](https://arxiv.org/html/2604.20166#bib.bib50)\)\. Robustness extends this to adversarial conditions where inputs are crafted to bypass systems\(Wang et al\.,[2023](https://arxiv.org/html/2604.20166#bib.bib69); Alghamdi et al\.,[2025](https://arxiv.org/html/2604.20166#bib.bib3)\)\. Core tasks ensuring reliability and robustness include risk, distress, and crisis identification\(Yang et al\.,[2023](https://arxiv.org/html/2604.20166#bib.bib77); Kang et al\.,[2024](https://arxiv.org/html/2604.20166#bib.bib29); Cho et al\.,[2023](https://arxiv.org/html/2604.20166#bib.bib17)\)\.

##### Methods and Evaluation\.

Methods span expertise alignment, dynamic calibration with uncertainty, and adversarial training\(Kang et al\.,[2024](https://arxiv.org/html/2604.20166#bib.bib29); Reuben et al\.,[2025](https://arxiv.org/html/2604.20166#bib.bib59); Dhuliawala et al\.,[2023](https://arxiv.org/html/2604.20166#bib.bib19); Srivastava et al\.,[2024](https://arxiv.org/html/2604.20166#bib.bib64); Qiu et al\.,[2025](https://arxiv.org/html/2604.20166#bib.bib57)\)\. Evaluations include robustness audits, out\-of\-distribution stress tests, and calibration assessments\(Zhu et al\.,[2025](https://arxiv.org/html/2604.20166#bib.bib86)\)\. Stakeholders diverge: AI researchers prioritize consistency; practitioners prioritize avoiding risks; safety researchers prioritize resilience under manipulation\(Cho et al\.,[2023](https://arxiv.org/html/2604.20166#bib.bib17); Liu et al\.,[2023](https://arxiv.org/html/2604.20166#bib.bib41)\)\.

### 5\.2Safety and Harm Prevention

##### Scope\.

Safety concerns the system’s capacity to avoid generating harmful or clinically inappropriate content, as a single unsafe response can cause lasting damage\(Cho et al\.,[2023](https://arxiv.org/html/2604.20166#bib.bib17); Badawi et al\.,[2025](https://arxiv.org/html/2604.20166#bib.bib7); Na et al\.,[2025](https://arxiv.org/html/2604.20166#bib.bib48)\)\. Operationalization involves empathy, psychological\-based attack, risk detection, content filters, and escalation triggers\(Chen et al\.,[2023](https://arxiv.org/html/2604.20166#bib.bib15); Hua et al\.,[2024](https://arxiv.org/html/2604.20166#bib.bib26); Ozgun et al\.,[2025](https://arxiv.org/html/2604.20166#bib.bib52); Zhang et al\.,[2024](https://arxiv.org/html/2604.20166#bib.bib83)\)\.

##### Methods and Evaluation\.

Model\-level safety integrates alignments, guardrails, policy\-controlled refusals, and red\-teaming\(Chen et al\.,[2023](https://arxiv.org/html/2604.20166#bib.bib15); Yu et al\.,[2025](https://arxiv.org/html/2604.20166#bib.bib80); Hua et al\.,[2024](https://arxiv.org/html/2604.20166#bib.bib26); Liu et al\.,[2023](https://arxiv.org/html/2604.20166#bib.bib41); Qiu et al\.,[2025](https://arxiv.org/html/2604.20166#bib.bib57)\)\. Evaluation focuses on jailbreak resistance, crisis, and ambiguous intent handling\(Alghamdi et al\.,[2025](https://arxiv.org/html/2604.20166#bib.bib3); Baidal et al\.,[2025](https://arxiv.org/html/2604.20166#bib.bib8)\)\. Stakeholder priorities diverge: safety researchers emphasize resistance, clinicians prioritize crisis recognition, and HCI researchers caution that overly restrictive refusals can undermine rapport\(Wang et al\.,[2025c](https://arxiv.org/html/2604.20166#bib.bib72); Wester et al\.,[2024b](https://arxiv.org/html/2604.20166#bib.bib74); Cho et al\.,[2023](https://arxiv.org/html/2604.20166#bib.bib17)\)\.

### 5\.3Explainability and Fairness

##### Scope\.

Explainability enables informed use of the MHS systems by helping users understand how and why a system produces particular outputs, while fairness addresses whether the system behaves equitably across demographics and populationsGabriel et al\. \([2024](https://arxiv.org/html/2604.20166#bib.bib22)\); Lissak et al\. \([2024](https://arxiv.org/html/2604.20166#bib.bib38)\)\.

##### Methods and Evaluation\.

Explainability can be achieved by expert\-alignment, RAG, tailored memoryBi et al\. \([2025](https://arxiv.org/html/2604.20166#bib.bib10)\); Zhang et al\. \([2025](https://arxiv.org/html/2604.20166#bib.bib82)\); Wang et al\. \([2025a](https://arxiv.org/html/2604.20166#bib.bib70)\); Gollapalli et al\. \([2023](https://arxiv.org/html/2604.20166#bib.bib25)\), and commonly evaluated through system\-level analyses such as rationale inspection or alignment between explanations and responsesZhai et al\. \([2025](https://arxiv.org/html/2604.20166#bib.bib81)\)\. Fairness is assessed by comparing system behavior and outcomes across groupsGabriel et al\. \([2024](https://arxiv.org/html/2604.20166#bib.bib22)\); Qi et al\. \([2025](https://arxiv.org/html/2604.20166#bib.bib55)\), stress\-testing for distributional shift, and auditing biases in data, retrieval, or generation pipelines\.

### 5\.4Privacy and Data Protection

##### Scope\.

Privacy determines whether a system protects sensitive disclosures and prevents memorized information from surfacing\. Trust collapses if users suspect their narratives could be exposed\(Baidal et al\.,[2025](https://arxiv.org/html/2604.20166#bib.bib8); Kwesi et al\.,[2025](https://arxiv.org/html/2604.20166#bib.bib31)\)\. Many corpora derive from semi\-public forums or clinical notes with incomplete de\-identification\(Cho et al\.,[2023](https://arxiv.org/html/2604.20166#bib.bib17); Qiu and Lan,[2025](https://arxiv.org/html/2604.20166#bib.bib56); Cabrera Lozoya et al\.,[2025](https://arxiv.org/html/2604.20166#bib.bib12)\)\.

##### Methods and Evaluation\.

Privacy\-preserving techniques include differential privacy, restricted fine\-tuning, memorization auditing, and federated learning\(Kwesi et al\.,[2025](https://arxiv.org/html/2604.20166#bib.bib31); Shin et al\.,[2023](https://arxiv.org/html/2604.20166#bib.bib61)\)\. Evaluation relies on extraction attacks and membership inference tests\(Yu et al\.,[2025](https://arxiv.org/html/2604.20166#bib.bib80)\)\. Stakeholders diverge: regulators focus on compliance; security researchers map leakage vectors; clinicians expect clinical\-standard confidentiality; users equate privacy with safety\(Wang et al\.,[2025c](https://arxiv.org/html/2604.20166#bib.bib72)\)\.

## 6Discussion

### 6\.1Bridging Perceived Trust and Trustworthiness across Stakeholders

Our review shows that “trust” in mental health AI is used in two distinct senses:*perceived trust*\(users’ confidence and willingness to rely on a system\) versus*trustworthiness*\(whether the system is actually safe and reliable under technical, clinical, and regulatory criteria\)\. These often diverge because stakeholders study different parts of the problem\.

Psychotherapy and HCI work frames trust as a relational, context\-sensitive user response shaped by interaction and user factorsChi et al\. \([2021](https://arxiv.org/html/2604.20166#bib.bib16)\); Kauttonen et al\. \([2025](https://arxiv.org/html/2604.20166#bib.bib30)\); Gille et al\. \([2025](https://arxiv.org/html/2604.20166#bib.bib23)\); Rai et al\. \([2025](https://arxiv.org/html/2604.20166#bib.bib58)\); Thieme et al\. \([2023](https://arxiv.org/html/2604.20166#bib.bib67)\); Namvarpour and Razi \([2024](https://arxiv.org/html/2604.20166#bib.bib49)\); Wang et al\. \([2025b](https://arxiv.org/html/2604.20166#bib.bib71)\)\. AI/NLP work operationalizes trustworthiness via model\- and system\-level criteria \(e\.g\., robustness and evaluation validity\), and documents output variability and biased or unfaithful evaluation signalsLupart et al\. \([2025](https://arxiv.org/html/2604.20166#bib.bib43)\); Cao and et al\. \([2025](https://arxiv.org/html/2604.20166#bib.bib14)\); Liu et al\. \([2024](https://arxiv.org/html/2604.20166#bib.bib40)\); Wallat et al\. \([2025](https://arxiv.org/html/2604.20166#bib.bib68)\); Balog et al\. \([2025](https://arxiv.org/html/2604.20166#bib.bib9)\)\. Security and safety research highlights adversarial failures and privacy leakage, which are amplified by highly sensitive disclosures in mental health settingsYu et al\. \([2025](https://arxiv.org/html/2604.20166#bib.bib80)\); Hua et al\. \([2024](https://arxiv.org/html/2604.20166#bib.bib26)\); Baidal et al\. \([2025](https://arxiv.org/html/2604.20166#bib.bib8)\)\. Regulators and professional bodies emphasize risk mitigation, transparency, privacy, and human oversightFDA \([2021](https://arxiv.org/html/2604.20166#bib.bib21)\); NIST \([2023](https://arxiv.org/html/2604.20166#bib.bib51)\); AIHLEG \([2019](https://arxiv.org/html/2604.20166#bib.bib1)\); APA \([2025](https://arxiv.org/html/2604.20166#bib.bib6)\)\.

The result is a systematic gap: interaction cues can inflate perceived trust even when system performance is weak, while strong safeguards may not yield appropriate trust if interaction design is opaque or restrictive\. Trustworthy mental health AI therefore requires alignment across stakeholders and layers, and evaluations that explicitly link user trust to interaction\- and AI\-oriented trustworthiness rather than treating them as separate objectives\.

### 6\.2From Trust Maximization to Calibration

##### Position\.

Much prior work aims to increase user trust by improving interaction quality \(e\.g\., empathetic language, fluency, or human\-like behavior\)Song et al\. \([2025](https://arxiv.org/html/2604.20166#bib.bib63)\); Sun et al\. \([2025](https://arxiv.org/html/2604.20166#bib.bib65)\)\. Our review suggests this is risky in mental health settings because perceived trust can be driven by interaction cues that are weakly tied to actual safety and reliability\. For instance, anthropomorphic cues can increase trustWu et al\. \([2023](https://arxiv.org/html/2604.20166#bib.bib76)\); Brunswicker et al\. \([2025](https://arxiv.org/html/2604.20166#bib.bib11)\)even when model behavior remains unstableLupart et al\. \([2025](https://arxiv.org/html/2604.20166#bib.bib43)\)\. Evaluation can further inflate apparent trustworthiness: RAG attributions may look correct without being faithful to the evidence usedWallat et al\. \([2025](https://arxiv.org/html/2604.20166#bib.bib68)\), and LLM\-as\-a\-judge can introduce systematic biasBalog et al\. \([2025](https://arxiv.org/html/2604.20166#bib.bib9)\)\.

We therefore argue for shifting from*trust maximization*to*trust calibration*: systems should be developed and evaluated so that user trust tracks demonstrated interaction\-level performance and underlying AI\-level capability, rather than persuasive conversational cuesSong et al\. \([2025](https://arxiv.org/html/2604.20166#bib.bib63)\)\.

##### Implication\.

A trust calibration perspective calls for coordinated development and evaluation across layers of trustworthy AI: trust should track demonstrated system capability, not isolated trust signals\.

At*human\-oriented*layer, trust measurement should go beyond a single score to detect over\-reliance, miscalibration, and mismatched expectations, especially for vulnerable users sensitive to interaction cues\. At*interaction\-oriented*layer, anthropomorphism, and transparency should act as regulatory signals—communicating uncertainty, limitations, and boundaries rather than merely boosting comfort or engagementSong et al\. \([2025](https://arxiv.org/html/2604.20166#bib.bib63)\); Brunswicker et al\. \([2025](https://arxiv.org/html/2604.20166#bib.bib11)\)\. At*AI\-oriented*layer, system\-level criteria \(e\.g\., safety, privacy, and evaluation validity\) must surface in interaction behavior and user\-facing disclosures; in high\-stakes settings, reliance on LLM\-based evaluators warrants cautionLupart et al\. \([2025](https://arxiv.org/html/2604.20166#bib.bib43)\); Balog et al\. \([2025](https://arxiv.org/html/2604.20166#bib.bib9)\)\.

## 7Conclusion

This survey synthesizes literature on trustworthy AI for mental health through a three\-layer, stakeholder\-specific framework\. We show that trust failures often stem from misalignment across layers and stakeholder priorities rather than isolated system flaws\. We argue that advancing trustworthy mental health AI requires shifting from maximizing perceived trust to calibrating trust to demonstrated interaction\- and system\-level capabilities\. This framework offers insights for future research, evaluation, and regulation of AI for mental health support\.

## Limitations

This survey work has several limitations\.

First, our analysis is grounded in academic research literature\. Real\-world deployment practices, proprietary systems, and the lived experiences of clinicians may reveal additional trust concerns that are not fully captured in published studies\.

Second, despite broad coverage, the trustworthy AI literature is rapidly evolving\. Relevant work may have been missed or published outside our review window, particularly in fast\-moving areas\.

Third, our synthesis reflects how trust and trustworthiness are operationalized in existing research, which may vary substantially across disciplines\. As a result, some dimensions may appear more prominent due to research trends rather than their intrinsic importance in practice\.

Lastly, our three\-layer framework is conceptual rather than predictive: it organizes stakeholder perspectives but does not offer quantitative guarantees about system safety or user outcomes\.

## Ethical Consideration

This work does not introduce new human\-subject experiments or systems for mental health support\. However, by synthesizing existing work, it aims to support more ethically grounded development, design, evaluation, and governance of trustworthy AI systems for mental health support\.

There are several ethical concerns and potential risks under this topic for considering\. First, AI systems in mental health raise heightened ethical risks due to user vulnerability and the sensitivity of personal disclosures\. A primary concern is miscalibrated trust: systems may appear clinically competent or emotionally supportive beyond their actual capabilities, leading to over\-reliance, delayed help\-seeking, or inappropriate substitution for professional care\. Anthropomorphic and empathetic interaction cues can further amplify this risk if not carefully constrained\.

Second, privacy and data protection are also critical\. Mental health AI often processes highly sensitive personal information, creating risks of data leakage, unintended memorization, or misuse\. Safety assumptions validated in offline settings may not hold under real\-world deployment\.

Third, bias and uneven performance across populations present additional ethical concerns\. Models trained on limited or unrepresentative data may underperform for certain groups, potentially reinforcing existing mental health disparities\.

Lastly, evaluation and deployment practices also pose risks\. Over\-reliance on automatic metrics or LLM\-based evaluators can obscure failure modes and create a false sense of safety\. Transparent reporting, human oversight, and clear communication of system limitations are therefore essential\.

## References

- AIHLEG \(2019\)AIHLEG\. 2019\.[Ethics guidelines for trustworthy AI](https://digital-strategy.ec.europa.eu/en/library/ethics-guidelines-trustworthy-ai)\.Technical report\.
- Aktan et al\. \(2022\)Mehmet Emin Aktan, Zeynep Turhan, and İlknur Dolu\. 2022\.[Attitudes and perspectives towards the preferences for artificial intelligence in psychotherapy](https://doi.org/10.1016/j.chb.2022.107273)\.*Computers in Human Behavior*, 133:107273\.
- Alghamdi et al\. \(2025\)Emad A\. Alghamdi, Reem Masoud, Deema Alnuhait, Afnan Y\. Alomairi, Ahmed Ashraf, and Mohamed Zaytoon\. 2025\.[AraTrust: An evaluation of trustworthiness for LLMs in Arabic](https://aclanthology.org/2025.coling-main.579/)\.In*Proceedings of the 31st International Conference on Computational Linguistics*, pages 8664–8679, Abu Dhabi, UAE\. Association for Computational Linguistics\.
- AMA \(2024\)AMA\. 2024\.[Augmented intelligence development, deployment, and use in health care](https://www.ama-assn.org/system/files/ama-ai-principles.pdf)\.Technical report\.
- Aoki \(2021\)Naomi Aoki\. 2021\.[The importance of the assurance that “humans are still in the decision loop” for public trust in artificial intelligence: Evidence from an online experiment](https://doi.org/10.1016/j.chb.2020.106572)\.*Computers in Human Behavior*, 114:106572\.
- APA \(2025\)APA\. 2025\.Ethical guidance for ai in the professional practice of health service psychology\.
- Badawi et al\. \(2025\)Abeer Badawi, Md Tahmid Rahman Laskar, Jimmy Huang, Shaina Raza, and Elham Dolatabadi\. 2025\.[Position: Beyond assistance – reimagining LLMs as ethical and adaptive co\-creators in mental health care](https://openreview.net/forum?id=j3totqf8xW)\.In*Forty\-second International Conference on Machine Learning Position Paper Track*\.
- Baidal et al\. \(2025\)Miguel Baidal, Erik Derner, and Nuria Oliver\. 2025\.[Guardians of trust: Risks and opportunities for LLMs in mental health](https://doi.org/10.18653/v1/2025.nlp4pi-1.2)\.In*Proceedings of the Fourth Workshop on NLP for Positive Impact \(NLP4PI\)*, pages 11–22, Vienna, Austria\. Association for Computational Linguistics\.
- Balog et al\. \(2025\)Krisztian Balog, Donald Metzler, and Tao Qin\. 2025\.Rankers, judges, and assistants: On the interaction between retrieval models and large language model\-based evaluation\.In*Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval*\. ACM\.
- Bi et al\. \(2025\)Guanqun Bi, Zhuang Chen, Zhoufu Liu, Hongkai Wang, Xiyao Xiao, Yuqiang Xie, Wen Zhang, Yongkang Huang, Yuxuan Chen, Libiao Peng, and Minlie Huang\. 2025\.[MAGI: Multi\-agent guided interview for psychiatric assessment](https://doi.org/10.18653/v1/2025.findings-acl.1278)\.In*Findings of the Association for Computational Linguistics: ACL 2025*, pages 24898–24921, Vienna, Austria\. Association for Computational Linguistics\.
- Brunswicker et al\. \(2025\)Sabine Brunswicker, Yifan Zhang, Christopher Rashidian, and Daniel W\. Linna\. 2025\.[Trust through words: The systemize\-empathize\-effect of language in task\-oriented conversational agents](https://doi.org/10.1016/j.chb.2024.108516)\.*Computers in Human Behavior*, 165:108516\.
- Cabrera Lozoya et al\. \(2025\)Daniel Cabrera Lozoya, Eloy Hernandez Lua, Juan Alberto Barajas Perches, Mike Conway, and Simon D’Alfonso\. 2025\.[Synthetic empathy: Generating and evaluating artificial psychotherapy dialogues to detect empathy in counseling sessions](https://doi.org/10.18653/v1/2025.clpsych-1.13)\.In*Proceedings of the 10th Workshop on Computational Linguistics and Clinical Psychology \(CLPsych 2025\)*, pages 157–171, Albuquerque, New Mexico\. Association for Computational Linguistics\.
- Cao et al\. \(2025\)Jiashuo Cao, Yun Suen Pai, Chen Li, Simon Hoermann, and Mark Billinghurst\. 2025\." can i have my friend attending with me?": Design implications for using virtual supporters in remote psychotherapy\.In*Proceedings of the Extended Abstracts of the CHI Conference on Human Factors in Computing Systems*, pages 1–7\.
- Cao and et al\. \(2025\)Yang Cao and et al\. 2025\.Writing style matters: An examination of bias and fairness in text embedding models\.In*Proceedings of the ACM International Conference on Web Search and Data Mining \(WSDM\)*\. ACM\.
- Chen et al\. \(2023\)Yirong Chen, Xiaofen Xing, Jingkai Lin, Huimin Zheng, Zhenyu Wang, Qi Liu, and Xiangmin Xu\. 2023\.[SoulChat: Improving LLMs’ empathy, listening, and comfort abilities through fine\-tuning with multi\-turn empathy conversations](https://doi.org/10.18653/v1/2023.findings-emnlp.83)\.In*Findings of the Association for Computational Linguistics: EMNLP 2023*, pages 1170–1183, Singapore\. Association for Computational Linguistics\.
- Chi et al\. \(2021\)Oscar Hengxuan Chi, Shizhen Jia, Yafang Li, and Dogan Gursoy\. 2021\.[Developing a formative scale to measure consumers’ trust toward interaction with artificially intelligent \(AI\) social robots in service delivery](https://doi.org/10.1016/j.chb.2021.106700)\.*Computers in Human Behavior*, 118:106700\.
- Cho et al\. \(2023\)Young Min Cho, Sunny Rai, Lyle Ungar, João Sedoc, and Sharath Guntuku\. 2023\.[An integrative survey on mental health conversational agents to bridge computer science and medical perspectives](https://doi.org/10.18653/v1/2023.emnlp-main.698)\.In*Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing*, pages 11346–11369, Singapore\. Association for Computational Linguistics\.
- Choi et al\. \(2025\)Ryuhaerang Choi, Taehan Kim, Subin Park, Jennifer G Kim, and Sung\-Ju Lee\. 2025\.Private yet social: How llm chatbots support and challenge eating disorder recovery\.In*Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems*, pages 1–19\.
- Dhuliawala et al\. \(2023\)Shehzaad Dhuliawala, Vilém Zouhar, Mennatallah El\-Assady, and Mrinmaya Sachan\. 2023\.[A diachronic perspective on user trust in AI under uncertainty](https://doi.org/10.18653/v1/2023.emnlp-main.339)\.In*Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing*, pages 5567–5580, Singapore\. Association for Computational Linguistics\.
- EPEUCO \(2024\)EPEUCO\. 2024\.[Artificial intelligence act](https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX:32024R1689)\.
- FDA \(2021\)FDA\. 2021\.[Artificial intelligence/machine learning \(ai/ml\)\-based software as a medical device \(samd\) action plan](https://arxiv.org/html/2604.20166v1/www.fda.gov)\.
- Gabriel et al\. \(2024\)Saadia Gabriel, Isha Puri, Xuhai Xu, Matteo Malgaroli, and Marzyeh Ghassemi\. 2024\.[Can AI relate: Testing large language model response for mental health support](https://doi.org/10.18653/v1/2024.findings-emnlp.120)\.In*Findings of the Association for Computational Linguistics: EMNLP 2024*, pages 2206–2221, Miami, Florida, USA\. Association for Computational Linguistics\.
- Gille et al\. \(2025\)Felix Gille, Laura Maaß, Benjamin Ho, and Divya Srivastava\. 2025\.[From theory to practice: Viewpoint on economic indicators for trust in digital health](https://doi.org/10.2196/59111)\.*Journal of Medical Internet Research*, 27\.
- Goisauf et al\. \(2025\)Melanie Goisauf, Mónica Cano Abadía, Kaya Akyüz, Maciej Bobowicz, Alena Buyx, Ilaria Colussi, Marie\-Christine Fritzsche, Karim Lekadir, Pekka Marttinen, Michaela Th Mayrhofer, and Janos Meszaros\. 2025\.[Trust, trustworthiness, and the future of medical ai: Outcomes of an interdisciplinary expert workshop](https://doi.org/10.2196/71236)\.
- Gollapalli et al\. \(2023\)Sujatha Gollapalli, Beng Ang, and See\-Kiong Ng\. 2023\.[Identifying Early Maladaptive Schemas from mental health question texts](https://doi.org/10.18653/v1/2023.findings-emnlp.792)\.In*Findings of the Association for Computational Linguistics: EMNLP 2023*, pages 11832–11843, Singapore\. Association for Computational Linguistics\.
- Hua et al\. \(2024\)Wenyue Hua, Xianjun Yang, Mingyu Jin, Zelong Li, Wei Cheng, Ruixiang Tang, and Yongfeng Zhang\. 2024\.[TrustAgent: Towards safe and trustworthy LLM\-based agents](https://doi.org/10.18653/v1/2024.findings-emnlp.585)\.In*Findings of the Association for Computational Linguistics: EMNLP 2024*, pages 10000–10016, Miami, Florida, USA\. Association for Computational Linguistics\.
- Huang et al\. \(2024\)Yue Huang, Lichao Sun, Haoran Wang, Siyuan Wu, Qihui Zhang, Yuan Li, Chujie Gao, Yixin Huang, Wenhan Lyu, Yixuan Zhang, Xiner Li, Hanchi Sun, Zhengliang Liu, Yixin Liu, Yijue Wang, Zhikun Zhang, Bertie Vidgen, Bhavya Kailkhura, Caiming Xiong, and 52 others\. 2024\.Position: Trustllm: trustworthiness in large language models\.In*Proceedings of the 41st International Confrence on Machine Learning*, ICML’24\. JMLR\.org\.
- Huo et al\. \(2022\)Weiwei Huo, Guanghui Zheng, Jiaqi Yan, Le Sun, and Liuyi Han\. 2022\.[Interacting with medical artificial intelligence: Integrating self\-responsibility attribution, human–computer trust, and personality](https://doi.org/10.1016/j.chb.2022.107253)\.*Computers in Human Behavior*, 132:107253\.
- Kang et al\. \(2024\)Migyeong Kang, Goun Choi, Hyolim Jeon, Ji Hyun An, Daejin Choi, and Jinyoung Han\. 2024\.[CURE: Context\- and uncertainty\-aware mental disorder detection](https://doi.org/10.18653/v1/2024.emnlp-main.994)\.In*Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing*, pages 17924–17940, Miami, Florida, USA\. Association for Computational Linguistics\.
- Kauttonen et al\. \(2025\)Janne Kauttonen, Rebekah Rousi, and Ari Alamäki\. 2025\.[Trust and acceptance challenges in the adoption of AI applications in health care: Quantitative survey analysis](https://doi.org/10.2196/65567)\.*Journal of Medical Internet Research*, 27\.
- Kwesi et al\. \(2025\)Jabari Kwesi, Jiaxun Cao, Riya Manchanda, and Pardis Emami\-Naeini\. 2025\.Exploring user security and privacy attitudes and concerns toward the use of general\-purpose llm chatbots for mental health\.In*Proceedings of the 34th USENIX Conference on Security Symposium*, SEC ’25, USA\. USENIX Association\.
- Lee et al\. \(2025\)Jamie Lee, Kyuha Jung, Erin Gregg Newman, Emilie Chow, and Yunan Chen\. 2025\.Understanding adolescents’ perceptions of benefits and risks in health AI technologies through design fiction\.In*Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems*, pages 1–20\.
- Lee and Moray \(1992\)John D\. Lee and Neville Moray\. 1992\.[Trust, control strategies and allocation of function in human–machine systems](https://doi.org/10.1080/00140139208967392)\.*Ergonomics*, 35\(10\):1243–1270\.
- Leichtmann et al\. \(2023\)Benedikt Leichtmann, Christina Humer, Andreas Hinterreiter, Marc Streit, and Martina Mara\. 2023\.[Effects of explainable artificial intelligence on trust and human behavior in a high\-risk decision task](https://doi.org/10.1016/j.chb.2022.107539)\.*Computers in Human Behavior*, 139:107539\.
- Li et al\. \(2024a\)Anqi Li, Yu Lu, Nirui Song, Shuai Zhang, Lizhi Ma, and Zhenzhong Lan\. 2024a\.[Understanding the therapeutic relationship between counselors and clients in online text\-based counseling using LLMs](https://doi.org/10.18653/v1/2024.findings-emnlp.69)\.In*Findings of the Association for Computational Linguistics: EMNLP 2024*, pages 1280–1303, Miami, Florida, USA\. Association for Computational Linguistics\.
- Li et al\. \(2024b\)Haitao Li, Qian Dong, Junjie Chen, Huixue Su, Yujia Zhou, Qingyao Ai, Ziyi Ye, and Yiqun Liu\. 2024b\.Llms\-as\-judges: a comprehensive survey on llm\-based evaluation methods\.*arXiv preprint arXiv:2412\.05579*\.
- Liao and Sundar \(2022\)Q\.Vera Liao and S\. Shyam Sundar\. 2022\.[Designing for responsible trust in ai systems: A communication perspective](https://doi.org/10.1145/3531146.3533182)\.In*Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency*, FAccT ’22, page 1257–1268, New York, NY, USA\. Association for Computing Machinery\.
- Lissak et al\. \(2024\)Shir Lissak, Nitay Calderon, Geva Shenkman, Yaakov Ophir, Eyal Fruchter, Anat Brunstein Klomek, and Roi Reichart\. 2024\.[The colorful future of LLMs: Evaluating and improving LLMs as emotional supporters for queer youth](https://doi.org/10.18653/v1/2024.naacl-long.113)\.In*Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies \(Volume 1: Long Papers\)*, pages 2040–2079, Mexico City, Mexico\. Association for Computational Linguistics\.
- Liu and Tao \(2022\)Kaifeng Liu and Da Tao\. 2022\.[The roles of trust, personalization, loss of privacy, and anthropomorphism in public acceptance of smart healthcare services](https://doi.org/10.1016/j.chb.2021.107026)\.*Computers in Human Behavior*, 127:107026\.
- Liu et al\. \(2024\)Yang Liu, W\. Bruce Croft, and Hamed Zamani\. 2024\.Robust information retrieval\.In*Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval \(Tutorial\)*\. ACM\.
- Liu et al\. \(2023\)Yang Liu, Yuanshun Yao, Jean\-Francois Ton, Xiaoying Zhang, Ruocheng Guo, Hao Cheng, Yegor Klochkov, Muhammad Faaiz Taufiq, and Hang Li\. 2023\.[Trustworthy LLMs: a survey and guideline for evaluating large language models’ alignment](https://openreview.net/forum?id=oss9uaPFfB)\.In*Socially Responsible Language Modelling Research*\.
- Luetke Lanfer et al\. \(2023\)Hanna Luetke Lanfer, Doreen Reifegerste, Annika Berg, Paula Memenga, Eva Baumann, Winja Weber, Julia Geulen, Anne Müller, Andrea Hahne, and Susanne Weg\-Remers\. 2023\.[Understanding trust determinants in a live chat service on familial cancer: Qualitative triangulation study with focus groups and interviews in germany](https://doi.org/10.2196/44707)\.*Journal of Medical Internet Research*, 25\.
- Lupart et al\. \(2025\)Auguste Lupart, Alex Rigouts Terryn, Helena Gómez\-Adorno, and Xuejie Chen\. 2025\.Investigating variability in large language model\-based personalized conversational information retrieval\.In*Proceedings of the ACM SIGIR Asia\-Pacific Conference on Information Retrieval*\. ACM\.
- Ma et al\. \(2024\)Zilin Ma, Yiyang Mei, Yinru Long, Zhaoyuan Su, and Krzysztof Z Gajos\. 2024\.Evaluating the experience of lgbtq\+ people using large language model based chatbots for mental health support\.In*Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems*, pages 1–15\.
- Manzey et al\. \(2012\)Dietrich Manzey, Julian Reichenbach, and Linda Onnasch\. 2012\.Human performance consequences of automated decision aids: The impact of degree of automation and system experience\.*Journal of Cognitive Engineering and Decision Making*, 6\(1\):57–87\.
- Mayer et al\. \(2024\)Carlotta J\. Mayer, Julia Mahal, Daniela Geisel, Eva J\. Geiger, Elias Staatz, Maximilian Zappel, Seraina P\. Lerch, Johannes C\. Ehrenthal, Steffen Walter, and Beate Ditzen\. 2024\.[User preferences and trust in hypothetical analog, digitalized and AI\-based medical consultation scenarios: An online discrete choice survey](https://doi.org/10.1016/j.chb.2024.108419)\.*Computers in Human Behavior*, 161:108419\.
- Mayer et al\. \(1995\)Roger C\. Mayer, James H\. Davis, and F\. David Schoorman\. 1995\.[An integrative model of organizational trust](http://www.jstor.org/stable/258792)\.*The Academy of Management Review*, 20\(3\):709–734\.
- Na et al\. \(2025\)Hongbin Na, Yining Hua, Zimu Wang, Tao Shen, Beibei Yu, Lilin Wang, Wei Wang, John Torous, and Ling Chen\. 2025\.[A survey of large language models in psychotherapy: Current landscape and future directions](https://doi.org/10.18653/v1/2025.findings-acl.385)\.In*Findings of the Association for Computational Linguistics: ACL 2025*, pages 7362–7376, Vienna, Austria\. Association for Computational Linguistics\.
- Namvarpour and Razi \(2024\)Mohammad Namvarpour and Afsaneh Razi\. 2024\.Uncovering contradictions in human\-AI interactions: Lessons learned from user reviews of replika\.In*Companion Publication of the 2024 Conference on Computer\-Supported Cooperative Work and Social Computing*, pages 579–586\.
- Nguyen et al\. \(2025\)Viet Cuong Nguyen, Mohammad Taher, Dongwan Hong, Vinicius Konkolics Possobom, Vibha Thirunellayi Gopalakrishnan, Ekta Raj, Zihang Li, Heather J\. Soled, Michael L\. Birnbaum, Srijan Kumar, and Munmun De Choudhury\. 2025\.[Do large language models align with core mental health counseling competencies?](https://doi.org/10.18653/v1/2025.findings-naacl.418)In*Findings of the Association for Computational Linguistics: NAACL 2025*, pages 7488–7511, Albuquerque, New Mexico\. Association for Computational Linguistics\.
- NIST \(2023\)NIST\. 2023\.[Nist AI risk management framework \(AI RMF 1\.0\)](https://nvlpubs.nist.gov/nistpubs/ai/nist.ai.100-1.pdf)\.
- Ozgun et al\. \(2025\)Mithat Can Ozgun, Jiahuan Pei, Koen Hindriks, Lucia Donatelli, Qingzhi Liu, and Junxiao Wang\. 2025\.[Trustworthy ai psychotherapy: Multi\-agent llm workflow for counseling and explainable mental disorder diagnosis](https://doi.org/10.1145/3746252.3761164)\.In*Proceedings of the 34th ACM International Conference on Information and Knowledge Management*, CIKM ’25, page 2263–2272, New York, NY, USA\. Association for Computing Machinery\.
- Page et al\. \(2021\)Matthew J Page, Joanne E McKenzie, Patrick M Bossuyt, Isabelle Boutron, Tammy C Hoffmann, Cynthia D Mulrow, Larissa Shamseer, Jennifer M Tetzlaff, Elie A Akl, Sue E Brennan, Roger Chou, Julie Glanville, Jeremy M Grimshaw, Asbjørn Hróbjartsson, Manoj M Lalu, Tianjing Li, Elizabeth W Loder, Evan Mayo\-Wilson, Steve McDonald, and 7 others\. 2021\.[The prisma 2020 statement: an updated guideline for reporting systematic reviews](https://doi.org/10.1136/bmj.n71)\.*BMJ*, 372\.
- Pillay \(2025\)Yegan Pillay\. 2025\.[Ethical decision\-making guidelines for mental health clinicians in the artificial intelligence \(ai\) era](https://doi.org/10.3390/HEALTHCARE13233057)\.*Healthcare 2025, Vol\. 13,*, 13:3057\.
- Qi et al\. \(2025\)Zhiyang Qi, Takumasa Kaneko, Keiko Takamizo, Mariko Ukiyo, and Michimasa Inaba\. 2025\.[KokoroChat: A Japanese psychological counseling dialogue dataset collected via role\-playing by trained counselors](https://doi.org/10.18653/v1/2025.acl-long.608)\.In*Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics \(Volume 1: Long Papers\)*, pages 12424–12443, Vienna, Austria\. Association for Computational Linguistics\.
- Qiu and Lan \(2025\)Huachuan Qiu and Zhenzhong Lan\. 2025\.[PsyDial: A large\-scale long\-term conversational dataset for mental health support](https://doi.org/10.18653/v1/2025.acl-long.1049)\.In*Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics \(Volume 1: Long Papers\)*, pages 21624–21655, Vienna, Austria\. Association for Computational Linguistics\.
- Qiu et al\. \(2025\)Jiahao Qiu, Yinghui He, Xinzhe Juan, Yimin Wang, Yuhan Liu, Zixin Yao, Yue Wu, Xun Jiang, Ling Yang, and Mengdi Wang\. 2025\.[EmoAgent: Assessing and safeguarding human\-AI interaction for mental health safety](https://doi.org/10.18653/v1/2025.emnlp-main.594)\.In*Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing*, pages 11741–11756, Suzhou, China\. Association for Computational Linguistics\.
- Rai et al\. \(2025\)Ansh Rai, Meghan E\. Hurley, John Herrington, Eric Alan Storch, Casey J\. Zampella, Julia Parish\-Morris, Anika Sonig, Gabriel Lázaro\-Muñoz, and Kristin Kostick\-Quenet\. 2025\.[Stakeholder criteria for trust in artificial intelligence–based computer perception tools in health care: Qualitative interview study](https://doi.org/10.2196/78757)\.*Journal of Medical Internet Research*, 27\.
- Reuben et al\. \(2025\)Maor Reuben, Ortal Slobodin, Idan\-Chaim Cohen, Aviad Elyashar, Orna Braun\-Lewensohn, Odeya Cohen, and Rami Puzis\. 2025\.[Assessment and manipulation of latent constructs in pre\-trained language models using psychometric scales](https://doi.org/10.18653/v1/2025.acl-long.121)\.In*Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics \(Volume 1: Long Papers\)*, pages 2433–2444, Vienna, Austria\. Association for Computational Linguistics\.
- Rodriguez Rodriguez et al\. \(2023\)Lucero Rodriguez Rodriguez, Carlos E\. Bustamante Orellana, Erin K\. Chiou, Lixiao Huang, Nancy Cooke, and Yun Kang\. 2023\.[A review of mathematical models of human trust in automation](https://doi.org/10.3389/fnrgo.2023.1171403)\.*Frontiers in Neuroergonomics*, Volume 4 \- 2023\.
- Shin et al\. \(2023\)Jaemin Shin, Hyungjun Yoon, Seungjoo Lee, Sungjoon Park, Yunxin Liu, Jinho Choi, and Sung\-Ju Lee\. 2023\.[FedTherapist: Mental health monitoring with user\-generated linguistic expressions on smartphones via federated learning](https://doi.org/10.18653/v1/2023.emnlp-main.734)\.In*Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing*, pages 11971–11988, Singapore\. Association for Computational Linguistics\.
- Sien and McGrenere \(2025\)Sang\-Wha Sien and Joanna McGrenere\. 2025\.[A gentle introduction to mental health through storytelling: Design and evaluation of digital human library](https://doi.org/10.1145/3710903)\.*Proc\. ACM Hum\.\-Comput\. Interact\.*, 9\(2\)\.
- Song et al\. \(2025\)Inhwa Song, Sachin R Pendse, Neha Kumar, and Munmun De Choudhury\. 2025\.[The typing cure: Experiences with large language model chatbots for mental health support](https://doi.org/10.1145/3757430)\.*Proc\. ACM Hum\.\-Comput\. Interact\.*, 9\(7\)\.
- Srivastava et al\. \(2024\)Aseem Srivastava, Smriti Joshi, Tanmoy Chakraborty, and Md Shad Akhtar\. 2024\.[Knowledge planning in large language models for domain\-aligned counseling summarization](https://doi.org/10.18653/v1/2024.emnlp-main.984)\.In*Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing*, pages 17775–17789, Miami, Florida, USA\. Association for Computational Linguistics\.
- Sun et al\. \(2025\)Xin Sun, Jan de Wit, Zhuying Li, Jiahuan Pei, Abdallah El Ali, and Jos A\. Bosch\. 2025\.[Script\-strategy aligned generation: Aligning llms with expert\-crafted dialogue scripts and therapeutic strategies for psychotherapy](https://doi.org/10.1145/3757655)\.*Proc\. ACM Hum\.\-Comput\. Interact\.*, 9\(7\)\.
- Swinger et al\. \(2025\)Nathaniel Swinger, Cynthia M\. Baseman, Myeonghan Ryu, Saeed Abdullah, Christopher W\. Wiese, Andrew M\. Sherrill, and Rosa I\. Arriaga\. 2025\.[There’s no "i" in teammait: Impacts of domain and expertise on trust in ai teammates for mental health work](https://doi.org/10.1145/3710917)\.*Proc\. ACM Hum\.\-Comput\. Interact\.*, 9\(2\)\.
- Thieme et al\. \(2023\)Anja Thieme, Maryann Hanratty, Maria Lyons, Jorge Palacios, Rita Faia Marques, Cecily Morrison, and Gavin Doherty\. 2023\.Designing human\-centered AI for mental health: Developing clinically relevant applications for online cbt treatment\.*ACM Transactions on Computer\-Human Interaction*, 30\(2\):1–50\.
- Wallat et al\. \(2025\)Jonas Wallat, Maria Heuss, Maarten de Rijke, and Avishek Anand\. 2025\.Correctness is not faithfulness in retrieval\-augmented generation attributions\.In*Proceedings of the ACM SIGIR International Conference on the Theory of Information Retrieval \(ICTIR\)*\. ACM\.
- Wang et al\. \(2023\)Boxin Wang, Weixin Chen, Hengzhi Pei, Chulin Xie, Mintong Kang, Chenhui Zhang, Chejian Xu, Zidi Xiong, Ritik Dutta, Rylan Schaeffer, Sang T\. Truong, Simran Arora, Manias Mazeika, Dan Hendrycks, Zinan Lin, Yu Cheng, Sanmi Koyejo, Dawn Song, and Bo Li\. 2023\.Decodingtrust: a comprehensive assessment of trustworthiness in gpt models\.In*Proceedings of the 37th International Conference on Neural Information Processing Systems*, NIPS ’23, Red Hook, NY, USA\. Curran Associates Inc\.
- Wang et al\. \(2025a\)Ming Wang, Peidong Wang, Lin Wu, Xiaocui Yang, Daling Wang, Shi Feng, Yuxin Chen, Bixuan Wang, and Yifei Zhang\. 2025a\.[AnnaAgent: Dynamic evolution agent system with multi\-session memory for realistic seeker simulation](https://doi.org/10.18653/v1/2025.findings-acl.1192)\.In*Findings of the Association for Computational Linguistics: ACL 2025*, pages 23221–23235, Vienna, Austria\. Association for Computational Linguistics\.
- Wang et al\. \(2025b\)Yimeng Wang, Yinzhou Wang, Kelly Crace, and Yixuan Zhang\. 2025b\.Understanding attitudes and trust of generative AI chatbots for social anxiety support\.In*Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems*, pages 1–21\.
- Wang et al\. \(2025c\)Yimeng Wang, Yinzhou Wang, Kelly Crace, and Yixuan Zhang\. 2025c\.[Understanding attitudes and trust of generative ai chatbots for social anxiety support](https://doi.org/10.1145/3706598.3714286)\.In*Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems*, CHI ’25, New York, NY, USA\. Association for Computing Machinery\.
- Wester et al\. \(2024a\)Joel Wester, Henning Pohl, Simo Hosio, and Niels van Berkel\. 2024a\.["this chatbot would never…": Perceived moral agency of mental health chatbots](https://doi.org/10.1145/3637410)\.*Proc\. ACM Hum\.\-Comput\. Interact\.*, 8\(CSCW1\)\.
- Wester et al\. \(2024b\)Joel Wester, Henning Pohl, Simo Hosio, and Niels van Berkel\. 2024b\.["this chatbot would never…": Perceived moral agency of mental health chatbots](https://doi.org/10.1145/3637410)\.*Proc\. ACM Hum\.\-Comput\. Interact\.*, 8\(CSCW1\)\.
- Woodcock et al\. \(2021\)Claire Woodcock, Brent Mittelstadt, Dan Busbridge, and Grant Blank\. 2021\.[The impact of explanations on layperson trust in artificial intelligence–driven symptom checker apps: Experimental study](https://doi.org/10.2196/29386)\.*Journal of Medical Internet Research*, 23\(11\)\.
- Wu et al\. \(2023\)Min Wu, Nanxi Wang, and Kum Fai Yuen\. 2023\.[Deep versus superficial anthropomorphism: Exploring their effects on human trust in shared autonomous vehicles](https://doi.org/10.1016/j.chb.2022.107614)\.*Computers in Human Behavior*, 141:107614\.
- Yang et al\. \(2023\)Kailai Yang, Shaoxiong Ji, Tianlin Zhang, Qianqian Xie, Ziyan Kuang, and Sophia Ananiadou\. 2023\.[Towards interpretable mental health analysis with large language models](https://doi.org/10.18653/v1/2023.emnlp-main.370)\.In*Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing*, pages 6056–6077, Singapore\. Association for Computational Linguistics\.
- You et al\. \(2025\)Dana You, Yuwon Kim, and Kyoungwon Seo\. 2025\.Tailored virtual agent guidance for stress management: Comparing directive and non\-directive approaches by alexithymia status\.In*Proceedings of the Extended Abstracts of the CHI Conference on Human Factors in Computing Systems*, pages 1–8\.
- Youn and Jin \(2021\)Seounmi Youn and S\. Venus Jin\. 2021\.[“in a\.i\. we trust?” the effects of parasocial interaction and technopian versus luddite ideological views on chatbot\-based customer relationship management in the emerging “feeling economy”](https://doi.org/10.1016/j.chb.2021.106721)\.*Computers in Human Behavior*, 119\.
- Yu et al\. \(2025\)Miao Yu, Fanci Meng, Xinyun Zhou, Shilong Wang, Junyuan Mao, Linsey Pan, Tianlong Chen, Kun Wang, Xinfeng Li, Yongfeng Zhang, Bo An, and Qingsong Wen\. 2025\.[A survey on trustworthy llm agents: Threats and countermeasures](https://doi.org/10.1145/3711896.3736561)\.In*Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V\.2*, KDD ’25, page 6216–6226, New York, NY, USA\. Association for Computing Machinery\.
- Zhai et al\. \(2025\)Wei Zhai, Nan Bai, Qing Zhao, Jianqiang Li, Fan Wang, Hongzhi Qi, Meng Jiang, Xiaoqin Wang, Bing Xiang Yang, and Guanghui Fu\. 2025\.[MentalGLM series: Explainable large language models for mental health analysis on Chinese social media](https://doi.org/10.18653/v1/2025.emnlp-main.686)\.In*Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing*, pages 13588–13603, Suzhou, China\. Association for Computational Linguistics\.
- Zhang et al\. \(2025\)Linhai Zhang, Ziyang Gao, Deyu Zhou, and Yulan He\. 2025\.[Explainable depression detection in clinical interviews with personalized retrieval\-augmented generation](https://doi.org/10.18653/v1/2025.findings-acl.517)\.In*Findings of the Association for Computational Linguistics: ACL 2025*, pages 9927–9944, Vienna, Austria\. Association for Computational Linguistics\.
- Zhang et al\. \(2024\)Zaibin Zhang, Yongting Zhang, Lijun Li, Hongzhi Gao, Lijun Wang, Huchuan Lu, Feng Zhao, Yu Qiao, and Jing Shao\. 2024\.[PsySafe: A comprehensive framework for psychological\-based attack, defense, and evaluation of multi\-agent system safety](https://doi.org/10.18653/v1/2024.acl-long.812)\.In*Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics \(Volume 1: Long Papers\)*, pages 15202–15231, Bangkok, Thailand\. Association for Computational Linguistics\.
- Zhao et al\. \(2025\)Jiukai Zhao, Yuqi Yang, Juanxia Miao, Xue Wang, Dianjun Qi, and Shuang Zang\. 2025\.[Factors associated with the level of trust in health information robots among the general population from a socioecological model perspective: Network analysis](https://doi.org/10.2196/68299)\.*Journal of Medical Internet Research*, 27\.
- Zheng et al\. \(2025\)Xi Zheng, Zhuoyang Li, Xinning Gui, and Yuhan Luo\. 2025\.Customizing emotional support: How do individuals construct and interact with llm\-powered chatbots\.In*Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems*, pages 1–20\.
- Zhu et al\. \(2025\)Zhihong Zhu, Yunyan Zhang, Xianwei Zhuang, Fan Zhang, Zhongwei Wan, Yuyan Chen, Qingqing Long, Yefeng Zheng, and Xian Wu\. 2025\.[Can we trust AI doctors? a survey of medical hallucination in large language and large vision\-language models](https://doi.org/10.18653/v1/2025.findings-acl.350)\.In*Findings of the Association for Computational Linguistics: ACL 2025*, pages 6748–6769, Vienna, Austria\. Association for Computational Linguistics\.

## Appendix

## Appendix AAI Usage Disclosure

We used AI tools in a supportive and limited role\. Specifically, GPT\-5\.2 was used for language editing \(e\.g\., improving clarity and conciseness\)\. All findings and figures are based on our own data and results\. The survey scoping, literature review, data analysis, and final content were conducted and verified by the authors\.

## Appendix BMethodology of Literature Review

Three\-Layer TrustLiterature Review Process Following PRISMA ProtocolRecords\(Initially Identified\)Records Excluded\(Exclusion Criteria\)Records Excluded\(Inclusion Criteria\)Records Included\(Final for Analysis\)Human\-oriented
TrustTotal N = 252:
CHB = 80, JMIR = 172N = 174N = 62N = 16AI\-oriented
TrustworthinessTotal N = 106:
ACL = 45, EMNLP = 40, NAACL = 8, COLING = 10, IJCAI = 2, USENIX = 1N = 40N = 25N = 41Interaction\-oriented
TrustworthinessTotal N = 377:
ACM CHI = 234, ACM CSCW = 97, ACM TOCHI = 9, IJHCS = 37N = 303N = 58N = 16Table 1:Summary of the literature review process following the PRISMA protocolPage et al\. \([2021](https://arxiv.org/html/2604.20166#bib.bib53)\)\.DocumentAgencyYearKey Focus of TrustworthinessAI/ML\-Based Software as a Medical Device \(SaMD\) Action PlanFDA \([2021](https://arxiv.org/html/2604.20166#bib.bib21)\)US Food and Drug Administration2021Acknowledges the need for safe and effective AI/ML SaMD\. Focuses on ensuring system performance through Good Machine Learning Practice \(GMLP\), promoting Patient\-Centered Approaches that incorporate Transparency to users, and developing regulatory science methods to address Algorithmic Bias and ensure Robustness\.NIST AI Risk Management Framework \(AI RMF 1\.0\)NIST \([2023](https://arxiv.org/html/2604.20166#bib.bib51)\)National Institute of Standards and Technology2023Focuses on seven characteristics of trustworthy AI to manage risks and promote responsible use: Valid and Reliable, Safe, Secure and Resilient, Accountable and Transparent, Explainable and Interpretable, Privacy\-Enhanced, and Fair with Harmful Bias Managed\. Accountability and Transparency are emphasized as relating to all other characteristics and internal processes\.Artificial Intelligence Act \(AI Act\)EPEUCO \([2024](https://arxiv.org/html/2604.20166#bib.bib20)\)European Parliament and Council2024Promotes human\-centric and trustworthy AI while ensuring a high level of protection of health, safety, and fundamental rights\. Mandatory requirements for high\-risk systems include Technical Robustness and Safety, Accuracy and Cybersecurity, high standards for Data Governance, and strong measures for Transparency and Accountability\. It is informed by the seven AI HLEG ethical requirements\.Augmented Intelligence Development, Deployment, and Use in Health CareAMA \([2024](https://arxiv.org/html/2604.20166#bib.bib4)\)American Medical Association2024Mandates that health care AI be designed, developed, and deployed in a manner that is ethical, equitable, responsible, accurate, transparent, and evidence\-based\. Key concerns are Transparency/Disclosure of AI use at the point of care, Bias Mitigation/Equity, alignment of Liability and Accountability with the entity best positioned to mitigate harm, and robust Data Privacy and Cybersecurity\.Ethical Guidance for AI in the Professional Practice of Health Service PsychologyAPA \([2025](https://arxiv.org/html/2604.20166#bib.bib6)\)American Psychological Association Ethics Committee2025Aligns AI use with fundamental psychological ethical principles \(Beneficence and Nonmaleficence, Integrity, Justice, and Respect for People’s Rights and Dignity\)\. Key areas include Transparency and Informed Consent \(including the right to opt out\), Mitigating Bias and Promoting Equity, Data Privacy and Security \(e\.g\., HIPAA compliance\), Accuracy and Misinformation Risks, and ensuring Human Oversight and Professional Judgment\.Ethical Decision\-Making Guidelines for Mental Health Clinicians in the Artificial Intelligence \(AI\) EraPillay \([2025](https://arxiv.org/html/2604.20166#bib.bib54)\)Journal of Healthcare2025Proposes a framework based on five pillars derived from professional ethical codes and AI guidelines: Autonomy and Informed Consent; Beneficence and Non\-Malfeasance; Confidentiality, Privacy, and Transparency; Justice, Fairness, and Inclusiveness; and Fidelity, Professional Integrity, and Accountability\. Emphasizes that AI must augment, not replace, human clinical care\.

Table 2:A summary of representative regulations and ethical standards in 2019–2025\.To ensure transparent and systematic coverage of research on trustworthy AI for mental health support \(MHS\) systems, we conducted a structured literature review guided by the PRISMA principlesPage et al\. \([2021](https://arxiv.org/html/2604.20166#bib.bib53)\)\. Our aim was to provide methodological rigor and reproducibility while adapting the process to the norms of interdisciplinary research, rather than adhering to the full formalism of clinical systematic reviews\.

##### Search Strategy\.

We surveyed publications from leading venues across the disciplines identified in our discipline analysis \([footnote 2](https://arxiv.org/html/2604.20166#footnote2)\), including NLP \(ACL, EMNLP, NAACL, COLING\), HCI \(i\.e\., ACM CHI333[https://dl\.acm\.org/doi/proceedings/10\.1145/3613904](https://dl.acm.org/doi/proceedings/10.1145/3613904), ACM CSCW444[https://dl\.acm\.org/doi/proceedings/10\.1145/3584931](https://dl.acm.org/doi/proceedings/10.1145/3584931), TOCHI555[https://dl\.acm\.org/journal/tochi](https://dl.acm.org/journal/tochi), IJHCS666[https://www\.sciencedirect\.com/journal/international\-journal\-of\-human\-computer\-studies](https://www.sciencedirect.com/journal/international-journal-of-human-computer-studies)\), and digital health \(i\.e\., JMIR777[https://www\.jmir\.org/](https://www.jmir.org/), CHB888[https://www\.sciencedirect\.com/journal/computers\-in\-human\-behavior](https://www.sciencedirect.com/journal/computers-in-human-behavior)\)\. We searched for papers published between January 2021 and November 2025 using keyword combinations related to“AI, large language models, dialogue systems, mental health, and safety”\. Searches were conducted over titles, abstracts, and main text, excluding reference sections\.

##### Inclusion and Exclusion Criteria\.

We included peer\-reviewed papers that study AI systems for mental health and explicitly address trust\-related criteria, such as trustworthiness, safety, interaction quality, or related risks \(e\.g\., empathy, reliability, robustness, privacy, crisis handling\)\. We excluded works that \(i\) focus solely on non\-health domains, \(ii\) address mental health without involving AI systems, \(iii\) present purely theoretical discussions without system design, empirical evaluation, or methodological contribution, or \(iv\) are position pieces without substantive technical or empirical grounding\. Non–peer\-reviewed articles, tutorials, and extended abstracts were also excluded\. The articles classified as a survey, scoping review, or literature review are excluded as well\.

##### Screening and Selection\.

The screening process followed a multi\-stage procedure as shown in Table[1](https://arxiv.org/html/2604.20166#A2.T1)\. First, titles and abstracts were reviewed to remove clearly irrelevant papers\. Second, full\-text screening was conducted to assess relevance against the inclusion criteria, with particular attention to whether the work engaged with trust\-related concepts at the human, interaction, or AI level\. Ambiguous cases were discussed among the authors to reach consensus\. The final set of papers \(N=61\) was then categorized according to stakeholder perspective and mapped onto the three\-layer trust framework used throughout this survey\.

DimensionCriteriaEvaluationsLiteratureHuman\-oriented TrustWhatAbilityLikert scaleChi et al\. \([2021](https://arxiv.org/html/2604.20166#bib.bib16)\)Qualitative \(interview, focus group\)Luetke Lanfer et al\. \([2023](https://arxiv.org/html/2604.20166#bib.bib42)\)Behavioral monitoringBrunswicker et al\. \([2025](https://arxiv.org/html/2604.20166#bib.bib11)\)–Gille et al\. \([2025](https://arxiv.org/html/2604.20166#bib.bib23)\)BenevolenceQualitative \(interview, focus group\)Luetke Lanfer et al\. \([2023](https://arxiv.org/html/2604.20166#bib.bib42)\); Rai et al\. \([2025](https://arxiv.org/html/2604.20166#bib.bib58)\)IntegrityQualitative \(interview, focus group\)Luetke Lanfer et al\. \([2023](https://arxiv.org/html/2604.20166#bib.bib42)\)–Gille et al\. \([2025](https://arxiv.org/html/2604.20166#bib.bib23)\)InterviewRai et al\. \([2025](https://arxiv.org/html/2604.20166#bib.bib58)\)Behavioral monitoring; Likert scaleLeichtmann et al\. \([2023](https://arxiv.org/html/2604.20166#bib.bib34)\)Individual PerceptionQualitative \(interview, focus group\)Luetke Lanfer et al\. \([2023](https://arxiv.org/html/2604.20166#bib.bib42)\)Likert scaleChi et al\. \([2021](https://arxiv.org/html/2604.20166#bib.bib16)\)Behavioral monitoringBrunswicker et al\. \([2025](https://arxiv.org/html/2604.20166#bib.bib11)\)WhoInternal CharacteristicsLikert scaleKauttonen et al\. \([2025](https://arxiv.org/html/2604.20166#bib.bib30)\); Zhao et al\. \([2025](https://arxiv.org/html/2604.20166#bib.bib84)\); Huo et al\. \([2022](https://arxiv.org/html/2604.20166#bib.bib28)\)Literacy levelLikert scaleWoodcock et al\. \([2021](https://arxiv.org/html/2604.20166#bib.bib75)\)Operational ControlLikert scaleLiu and Tao \([2022](https://arxiv.org/html/2604.20166#bib.bib39)\); Aoki \([2021](https://arxiv.org/html/2604.20166#bib.bib5)\)Behavioral monitoringMayer et al\. \([2024](https://arxiv.org/html/2604.20166#bib.bib46)\)HowAnthropomorphismLikert scaleChi et al\. \([2021](https://arxiv.org/html/2604.20166#bib.bib16)\); Wu et al\. \([2023](https://arxiv.org/html/2604.20166#bib.bib76)\); Liu and Tao \([2022](https://arxiv.org/html/2604.20166#bib.bib39)\)Behavioral monitoringBrunswicker et al\. \([2025](https://arxiv.org/html/2604.20166#bib.bib11)\)ExplainabilityLikert scaleLeichtmann et al\. \([2023](https://arxiv.org/html/2604.20166#bib.bib34)\)Table 3:Summary of criteria in the “Human\-oriented trust” layer with evaluations & trust measures from our literature review\.

## Appendix CSummary of regulations and ethical standards\.

[Table 2](https://arxiv.org/html/2604.20166#A2.T2)summarizes key regulatory frameworks and ethical guidelines governing trustworthy AI in \(mental\) health, highlighting how agencies and professional bodies operationalize trustworthiness through requirements on safety, transparency, accountability, privacy, and human oversight\.

## Appendix DLiterature for “Human\-oriented” Trust

Human\-oriented trust captures how users perceive, form, and adjust trust toward AI systems and receive health services\. Prior work operationalizes this layer along three complementary dimensions: what trust is \(how it is conceptualized and measured\), who trusts \(user characteristics that shape trust judgments\), and how trust is formed through perceived system cues\. The literature primarily relies on subjective measures such as Likert\-scale questionnaires, supplemented by interviews, stated\-preference tasks, and behavioral monitoring\.[Table 3](https://arxiv.org/html/2604.20166#A2.T3)summarizes the frequent evaluation approaches and representative studies in this layer\.

CriteriaMethodsEvaluationsLiteratureInteraction\-oriented TrustworthinessCompetenceUser interviews; case analysisResponse usefulnessSong et al\. \([2025](https://arxiv.org/html/2604.20166#bib.bib63)\)Expert\-scriptsScript adherenceSun et al\. \([2025](https://arxiv.org/html/2604.20166#bib.bib65)\)Protocol\-based scoringChecklist assessmentSwinger et al\. \([2025](https://arxiv.org/html/2604.20166#bib.bib66)\)Communication StyleModular designExpert judgment\(Swinger et al\.,[2025](https://arxiv.org/html/2604.20166#bib.bib66)\)\(Song et al\.,[2025](https://arxiv.org/html/2604.20166#bib.bib63)\)Autonomy\-in\-the\-MiddleTransparencyStructured explanationsUser comprehensionSun et al\. \([2025](https://arxiv.org/html/2604.20166#bib.bib65)\)Visual / strategy cuesInterpretabilitySien and McGrenere \([2025](https://arxiv.org/html/2604.20166#bib.bib62)\)Capability / limitation disclosureTrust calibrationWester et al\. \([2024a](https://arxiv.org/html/2604.20166#bib.bib73)\)Empathy & EngagementScript / protocol\-aligned responsesBaseline comparisonSun et al\. \([2025](https://arxiv.org/html/2604.20166#bib.bib65)\)Structured interaction / narrativesEngagement self\-reportSien and McGrenere \([2025](https://arxiv.org/html/2604.20166#bib.bib62)\)Expert behavioral codingEmpathy ratingsSwinger et al\. \([2025](https://arxiv.org/html/2604.20166#bib.bib66)\)ControllabilityModular system designUser agencySong et al\. \([2025](https://arxiv.org/html/2604.20166#bib.bib63)\)User\-driven modules / guidanceUsage analyticsWester et al\. \([2024a](https://arxiv.org/html/2604.20166#bib.bib73)\)Table 4:Summary of criteria in “Interaction\-oriented trustworthiness” layer with methods and evaluations from our literature review\.CriteriaMethodsEvaluationsLiteratureAI\-oriented TrustworthinessReliability & RobustnessFine\-tuning; Prompting; Calibration; Uncertainty quantificationAccuracy; BLEU; BERTScore; Out\-of\-distribution tests; Calibration assessmentYang et al\. \([2023](https://arxiv.org/html/2604.20166#bib.bib77)\); Kang et al\. \([2024](https://arxiv.org/html/2604.20166#bib.bib29)\); Dhuliawala et al\. \([2023](https://arxiv.org/html/2604.20166#bib.bib19)\); Alghamdi et al\. \([2025](https://arxiv.org/html/2604.20166#bib.bib3)\)Safety & Harm PreventionRLHF; Alignment training; Rule\-based guardrails; Red\-teamingJailbreak resistance; Toxicity scoring; Crisis scenario testing; Refusal rateLiu et al\. \([2023](https://arxiv.org/html/2604.20166#bib.bib41)\); Hua et al\. \([2024](https://arxiv.org/html/2604.20166#bib.bib26)\); Yu et al\. \([2025](https://arxiv.org/html/2604.20166#bib.bib80)\); Alghamdi et al\. \([2025](https://arxiv.org/html/2604.20166#bib.bib3)\)Privacy & Data ProtectionDifferential privacy; Federated learning; Memorization auditing; Restricted fine\-tuningExtraction attacks; Membership inference tests; Memorization probesShin et al\. \([2023](https://arxiv.org/html/2604.20166#bib.bib61)\); Kwesi et al\. \([2025](https://arxiv.org/html/2604.20166#bib.bib31)\); Yu et al\. \([2025](https://arxiv.org/html/2604.20166#bib.bib80)\)ExplainabilityRAG; Chain\-of\-thought prompting; Multi\-agent workflow; Multi\-session memory; Step\-by\-step reasoningHuman evaluation; Rationale quality; Criterion anchoring; Evidence tagging\(Yang et al\.,[2023](https://arxiv.org/html/2604.20166#bib.bib77); Ozgun et al\.,[2025](https://arxiv.org/html/2604.20166#bib.bib52); Na et al\.,[2025](https://arxiv.org/html/2604.20166#bib.bib48); Gollapalli et al\.,[2023](https://arxiv.org/html/2604.20166#bib.bib25); Zhai et al\.,[2025](https://arxiv.org/html/2604.20166#bib.bib81); Zhang et al\.,[2025](https://arxiv.org/html/2604.20166#bib.bib82); Bi et al\.,[2025](https://arxiv.org/html/2604.20166#bib.bib10)\)FairnessBias auditing; Demographic\-aware prompting; Data balancing; Counterfactual testingDemographic parity; Empathy variance across subgroups; Group fairness metricsGabriel et al\. \([2024](https://arxiv.org/html/2604.20166#bib.bib22)\); Lissak et al\. \([2024](https://arxiv.org/html/2604.20166#bib.bib38)\); Baidal et al\. \([2025](https://arxiv.org/html/2604.20166#bib.bib8)\); Huang et al\. \([2024](https://arxiv.org/html/2604.20166#bib.bib27)\)Table 5:Summary of criteria in “AI\-oriented trustworthiness” layer with methods and evaluations from our literature review\.
## Appendix ELiterature for “Interaction\-oriented” Trustworthiness

Interaction\-oriented trustworthiness captures how trust is shaped through observable system behavior during interaction\. Across the literature, this layer is operationalized through criteria such as competence, communication style, transparency, empathy and engagement, and controllability\. Studies evaluate these criteria using a combination of expert review, user\-centered methods, and structured protocol assessments\.[Table 4](https://arxiv.org/html/2604.20166#A4.T4)summarizes the key criteria in this layer, along with the dominant methods and evaluation practices identified in our review\.

## Appendix FLiterature for “AI\-oriented” Trustworthiness

AI\-oriented trustworthiness focuses on whether mental health AI systems meet model\- or system\-level requirements for safety, reliability, and responsible deployment, independent of any specific criteria related to the interaction\. Prior work operationalizes this layer through criteria such as reliability and robustness, safety and harm prevention, privacy and data protection, explainability, and fairness\. Table\.[5](https://arxiv.org/html/2604.20166#A4.T5)summarizes the dominant technical methods and evaluation practices used to assess these criteria in the literature\.

Similar Articles

Towards trustworthy agentic AI: a comprehensive survey of safety, robustness, privacy, and system security

arXiv cs.AI

This survey provides a comprehensive examination of trustworthy agentic AI, focusing on safety, robustness, privacy, and system security. It clarifies key concepts, identifies risks along the agent workflow, summarizes mitigation strategies, and consolidates evaluation metrics and benchmarks, aiming to serve as a practical reference for deploying agentic AI in high-stakes environments.

AI safety needs social scientists

OpenAI Blog

OpenAI argues that AI safety research on value alignment requires social scientists to help address how human cognitive biases and inconsistencies affect the data used to train AI systems. The organization proposes human-only experiments as a method to uncover alignment problems before deploying machine learning solutions.

(Human) Attention Is (Still) All You Need: Human oversight makes AI-assisted social science reliable

arXiv cs.AI

This paper proposes that reliability in AI-assisted social science research depends on decision architecture—how cognitive labor is divided between humans and machines. Through a pre-specified factorial experiment, the authors show that an unconstrained multi-agent baseline fails in 72% of runs, while one organized with three architectural commitments (LLMs restricted to reasoning, deterministic data/estimation, and three human decision gates) fails in only 16%.

AI vs humans , whom do you trust more in 2026

Reddit r/singularity

A discussion on whether people find it easier to discuss personal topics with AI or humans, noting that AI offers a non-judgmental, always-available ear but lacks genuine human experience.