Ensemble Feature Selection and Harris Hawks Optimization for Explainable Mental Health Risk Prediction in Female Sex Workers

arXiv cs.AI 06/24/26, 04:00 AM Papers
Summary
This paper proposes a hybrid predictive model combining ensemble feature selection (ANOVA and mutual information) with Harris Hawks optimization-tuned logistic regression for explainable mental health risk prediction in female sex workers, achieving 95.78% accuracy.
arXiv:2606.24047v1 Announce Type: new Abstract: One of the significant mental health issues affecting female sex workers (FSWs) is mental disorders, especially depression. Exposure to violence, stigma, and economic hardship further increases their psychological risk. Current machine learning (ML) models are typically ineffective at capturing the high-dimensional and complex risk patterns that exist in this marginalized group. This paper suggests a hybrid predictive model that merges an ensemble feature selection strategy using ANOVA and mutual information and Harris Hawks optimization-tuned logistic regression and represents a new application of swarm intelligence to predict mental health in vulnerable groups. The explainable AI (XAI) methods can be used to understand the factors of trauma associated with model predictions. When applied to a group of 3,005 FSWs, it can be seen that the proposed model is more effective than traditional classifiers, with an accuracy of 95.78%, an F1 score of 95.77%, and an AUC of 0.96, and identifying post-traumatic stress, client-related violence, and occupational factors as major contributors to depression. This work bridges the gaps between conventional and ML approaches to develop an XAI tool that enables vulnerable groups to receive early assistance, evidence-based targeted psychosocial care, and health planning.
Original Article
View Cached Full Text
Cached at: 06/24/26, 07:44 AM
# Ensemble Feature Selection and Harris Hawks Optimization for Explainable Mental Health Risk Prediction in Female Sex Workers
Source: [https://arxiv.org/html/2606.24047](https://arxiv.org/html/2606.24047)
Md\. Parvej Hoque PalashShahriar Siddique AyonRamkrishna SahaAbdullah Al Mamun

###### Abstract

One of the significant mental health issues affecting female sex workers \(FSWs\) is mental disorders, especially depression\. Exposure to violence, stigma, and economic hardship further increases their psychological risk\. Current machine learning \(ML\) models are typically ineffective at capturing the high\-dimensional and complex risk patterns that exist in this marginalized group\. This paper suggests a hybrid predictive model that merges an ensemble feature selection strategy using ANOVA and mutual information and Harris Hawks optimization\-tuned logistic regression and represents a new application of swarm intelligence to predict mental health in vulnerable groups\. The explainable AI \(XAI\) methods can be used to understand the factors of trauma associated with model predictions\. When applied to a group of 3,005 FSWs, it can be seen that the proposed model is more effective than traditional classifiers, with an accuracy of 95\.78%, an F1 score of 95\.77%, and an AUC of 0\.96, and identifying post\-traumatic stress, client\-related violence, and occupational factors as major contributors to depression\. This work bridges the gaps between conventional and ML approaches to develop an XAI tool that enables vulnerable groups to receive early assistance, evidence\-based targeted psychosocial care, and health planning\.

## IIntroduction

Mental health issues are still a big problem for public health around the world in the 21st century\. Prediction of mental health status among female sex workers \(FSWs\) is still a crucial public health issue given the confluence of structural violence, stigma, economic vulnerability, and occupational risks\[[7](https://arxiv.org/html/2606.24047#bib.bib7)\]\. According to the World Health Organization \(WHO\), nearly one in seven people worldwide, about 1\.1 billion individuals, were living with a mental disorder in 2021, with depression and anxiety being the most prevalent conditions\[[23](https://arxiv.org/html/2606.24047#bib.bib3)\]\. Depression affects about 4% of the global population, including 5\.7% of adults, with higher prevalence among women than men, and together with anxiety disorders leads to an estimated annual loss of nearly US$1 trillion in productivity\[[22](https://arxiv.org/html/2606.24047#bib.bib5)\]\. These disorders are a leading cause of long\-term disability worldwide and are expected to rise significantly by 2040\[[25](https://arxiv.org/html/2606.24047#bib.bib4)\]\.

Vulnerable groups exposed to overlapping structural, social, and occupational stressors face higher rates of depression, post\-traumatic stress disorder \(PTSD\), anxiety, and suicidality\. Among these groups, FSWs are highly marginalized and face severe risks, including violence from partners and clients, stigma, social exclusion, HIV exposure, and economic and housing insecurity\[[21](https://arxiv.org/html/2606.24047#bib.bib26)\]\. These factors intensify psychological distress and limit access to care, with studies reporting depression rates up to 52\.7% and PTSD rates of 53\.6% among FSWs, far higher than in the general population\[[15](https://arxiv.org/html/2606.24047#bib.bib6)\]\.

Traditional methods has identified these associations, whereas machine learning \(ML\) has demonstrated significant potential in modeling intricate behavioral and health outcomes, including treatment\-seeking behavior, risky sexual behavior, and HIV risk stratification\[[8](https://arxiv.org/html/2606.24047#bib.bib1)\]\. Existing techniques frequently depend on traditional statistics or simplistic classifiers that inadequately address high\-dimensional, nonlinear, and imbalanced datasets, as well as intricate interactions between exposure to violence, socioeconomic variables, and health indicators\[[1](https://arxiv.org/html/2606.24047#bib.bib23),[3](https://arxiv.org/html/2606.24047#bib.bib2)\]\. Its insufficient hybrid feature selection and sophisticated optimization further deteriorate the levels of accuracy, stability, and interpretability\[[9](https://arxiv.org/html/2606.24047#bib.bib17),[2](https://arxiv.org/html/2606.24047#bib.bib18)\]\. These constraints highlight the importance of more complex and reliable predictive models to capture the complex mental health risk profile of FSWs and enable early intervention, psychosocial support, and planning of health services\.

To address these gaps, this study proposes a hybrid model, which integrates ensemble feature selection with Harris Hawks Optimization \(HHO\)\-tuned Logistic Regression \(LR\)\. The algorithm is tried on a community\-based dataset, which is already cleaned up\. Explainable AI \(XAI\) LIME also provides understandable, instance\-level explanations of model predictions, and, therefore, the decisions made are easier to understand and trust\. Major Initiatives of this work:

- •An ensemble feature selection method that involved ANOVA and mutual information to better identify feature relevance as compared to other methods\.
- •HHO is the first mental health prediction model to be applied to marginalized populations with high accuracy and HFO\-optimized LR, making it better than the baseline models\.
- •Implement XAI techniques to demonstrate important traumatic and occupational variables to predict depression, enhancing transparency and clinical applicability\.

The rest of the paper is structured in the following way: Section[II](https://arxiv.org/html/2606.24047#S2)provides the review of the relevant work, Section[III](https://arxiv.org/html/2606.24047#S3)describes the proposed methodology, Section[IV](https://arxiv.org/html/2606.24047#S4)provides and discusses the results of the experiment, and the last section[V](https://arxiv.org/html/2606.24047#S5)provides a conclusion of the study and the future directions of the research\.

## IILiterature Review

Interdependent risks, including violence, HIV exposure, and occupational risks, should be modeled to make accurate predictions of FSWs’ mental health\. Existing stressors such as housing and food insecurity further exacerbate the risk of depression and PTSD\[[20](https://arxiv.org/html/2606.24047#bib.bib10)\]\. Depression and anxiety are also strongly linked to intimate partner violence \(IPV\) and violence committed by clients, with emotional IPV having the strongest effect\[[10](https://arxiv.org/html/2606.24047#bib.bib9)\]\. Muhlen et al\. assert that stigma and social exclusion contribute to enhancing mental health risks further, which explains why psychosocial interventions are highly necessary\[[14](https://arxiv.org/html/2606.24047#bib.bib11)\]\. Lowering violence and improving psychosocial support are important steps toward helping FSWs with their mental health issues\.

Structural vulnerabilities such as violence and economic instability profoundly affect the mental health and HIV risk of FSWs\. Jewkes et al\.\[[5](https://arxiv.org/html/2606.24047#bib.bib24)\]through a multi stage, community based cross sectional survey of 3,005 FSWs, reported alarmingly poor mental health outcomes, with 52\.7% experiencing depression and 53\.6% meeting criteria for PTSD\. Similarly, Machisa et al\.\[[11](https://arxiv.org/html/2606.24047#bib.bib25)\], using structural equation modeling on a sample of 1,292 participants, found high prevalence rates of binge drinking \(50%\), depressive symptoms \(43%\), PTSD symptoms \(9%\), and suicidal ideation \(21%\)\.

Studies consistently show that FSWs face high rates of depression, PTSD, and suicidality, which can be effectively modeled using advanced ML techniques\. Zhang et al\.\[[24](https://arxiv.org/html/2606.24047#bib.bib12)\]evaluated seven ML algorithms—including logistic regression \(LR\), support vector machines \(SVM\), random forests \(RFC\), XGBoost, k\-nearest neighbors \(KNN\), naïve Bayes \(NB\), and neural networks \(NN\)—reporting acceptable performance\(ROC\>0\.51\)\(ROC\>0\.51\)in predicting risky sexual behaviors, with an overall accuracy of 78% but limited sensitivity \(11%\)\. Nethi et al\.\[[17](https://arxiv.org/html/2606.24047#bib.bib13)\]found that the light gradient boosting machine achieved the strongest predictive power for identifying candidates suitable for HIV pre\-exposure prophylaxis \(PrEP\), with an area under the curve \(AUC\) of 0\.88\. In a similar study, it was showed that decision trees \(DT\), SVM, and RFC were better at handling SMOTE\-processed data and that the accuracy and precision of decision\-making processes were 0\.871 and 0\.960, respectively\[[4](https://arxiv.org/html/2606.24047#bib.bib14)\]\. Qualitative studies emphasise stigma, violence, and inadequate psychosocial support as significant factors contributing to poor mental health among FSWs\[[13](https://arxiv.org/html/2606.24047#bib.bib15)\]\. These structural and psychosocial factors can enhance ML models for more accurate mental health prediction in this population\.

The mental health prediction among FSWs depends on such methods as chi\-square, ANOVA, mutual information \(MI\), Boruta, and tree\-based models to model intricate risk patterns\. Chi\-squared tests were applied to examine relationships between categorical variables, such as work conditions and mental health outcomes\[[9](https://arxiv.org/html/2606.24047#bib.bib17)\]\. The RFC model is the most effective among the four ML models used by Fauste et al\.\[[16](https://arxiv.org/html/2606.24047#bib.bib20)\]with a 75% accuracy of comorbidity of mental diseases and an 88\.8% accuracy of factors that lead to mental health vulnerability\.

Integrating ML and XAI enhances the accuracy and interpretability of mental health assessments, making it possible to intervene and support them more effectively\. According to Saha et al\.\[[18](https://arxiv.org/html/2606.24047#bib.bib22)\], the dynamic ensemble selection methods, including KNORA, have an accuracy of 81%, which clarifies the importance of using multiple models to enhance predictive accuracy\. Masudur et al\.\[[6](https://arxiv.org/html/2606.24047#bib.bib21)\]used the SMOTE to overcome the class imbalance and used SHapley Additive exPlanations \(SHAP\) to explain model results\. SHAP successfully described feature contributions, and the RFC model performed the best \(accuracy = 0\.66, recall = 0\.69\)\. According to these studies, integrating ML and XAI enhances predictability and interpretability in practical applications related to mental health\.

Existing research relies on limited models and struggles with complex risk factors that depend on each other\. Its performance is moderate, and it poorly generalizes on imbalanced data\. Most studies also lack integrated feature selection, optimization, and XAI, which lowers both accuracy and interpretability\. This study addresses these gaps with a unified hybrid ML and XAI framework\.

## IIIMethodology

This section is a clear and well\-organized presentation of the overall methodological framework of the study\. It begins with the data collection, preprocessing, feature selection, and the development of optimized models, and XAI is used to extract important positive and negative features\. Fig\.[1](https://arxiv.org/html/2606.24047#S3.F1)outlines the suggested framework to predict mental health status among FSWs on the basis of several perspectives\.

![Refer to caption](https://arxiv.org/html/2606.24047v1/x1.png)Figure 1:Proposed Methodological Framework for Depression Prediction among Female Sex Workers\.### III\-AData Collection, Cleaning, and Preprocessing

The statistics in this article are based on a giant community\-based cross\-sectional survey of 3,005 adult FSWs carried out in all nine provinces of South Africa\[[12](https://arxiv.org/html/2606.24047#bib.bib16)\]\. The research was aimed at learning important details about their lives, such as HIV status, mental health problems, such as depression and PTSD, violence experience, and work\-related issues\. The data were gathered in locations where sex worker support programs already exist, and hence it is easier to access the participants by the virtue of having trusted networks\. Accurate responses were collected using structured questionnaires and recorded in real time using REDCap\. FSWs were also engaged in the entire process, during the survey design and in data collection, which made the information more applicable, trustworthy, and realistic\.

The dataset initially had 20 columns with 3,005 rows and seven missing columns\. In the numerical columns, Years worked as sex worker, Age of first sex, Number of clients in past day, and Earning potential per client, the missing values were replaced with the mean\. Missing values in categorical columns were completely removed\. Label encoding of all categorical variables transformed them into numerical form to analyze them using MLS\. The column of enrolment date was divided into day, month, and year columns\. Superfluous or redundant columns were also eliminated in order to simplify the dataset\. The processed final dataset will have 2,911 rows and 19 feature columns\.

### III\-BHybrid Ensemble Feature Selection Strategy

After preparing the data, we using feature selection to determine the most significant variables\. We tested a variety of approaches such as chi\-squared, ANOVA, MI, Boruta, tree\-based algorithms, and ensemble\. Among them, there is the ensemble ANOVA and MI technique, which is a combination of statistical analysis of variance and information gain to sturdily detect the most significant features\. The ranking of features according to the importance of features using the ensemble feature selection method is shown in figure[2](https://arxiv.org/html/2606.24047#S3.F2)\.

![Refer to caption](https://arxiv.org/html/2606.24047v1/x2.png)Figure 2:Key Features Driving Depression Prediction Identified by Ensemble Feature Selection\.Figure[2](https://arxiv.org/html/2606.24047#S3.F2)presents the ranking of features influencing depression prediction among sex workers, as determined by the ensemble ANOVA and MI method\. PTSD outcome was the most important \(1\.558\), followed by Location \(1\.195\) and Earning potential per client \(0\.710\)\. Other key features included Client physical/sexual abuse \(0\.688\), Province \(0\.682\), and Condom use in the past month \(0\.347\)\. Other features such as years worked as a sex worker \(0\.218\), childhood abuse \(0\.206\), and number of clients in the past day \(0\.195\) also played a role; the rest of the features had lower scores \(0\.184\-0\.019\), which means that they had less impact on the depression prediction\.

According to the results of the ensemble feature selection, the least significant features were filtered off, resulting in an optimized set of 11 features and 2,911 observations\. Location and Province capture regional structural factors \(e\.g\., violence, access to care, and economic disadvantage\) not directly observed but relevant in South Africa\. They are used only for contextual risk stratification and referral guidance, not as causal individual predictors\. The target variable, Depression, consists of 1,530 positive cases and 1,381 negative cases, corresponding to labels 1 and 0, respectively\. The data was divided into 80% training and 20% testing; 20% of the training was used as a validation\.

### III\-CModel Selection and Hyperparameter Tuning

We tested several ML and DL models for depression prediction, tuning their hyperparameters for maximum performance\. Random Forest Classifier \(RFC\) is an ensemble of 100 decision trees with max depth 10 and minimum 2 samples per leaf, designed to reduce overfitting\. k\-Nearest Neighbors \(kNN\) classifies instances based on the majority label of the 5 nearest neighbors using Euclidean distance\. Support Vector Classifier \(SVC\) separates classes using an RBF kernel withC=1\.0C=1\.0andγ=0\.1\\gamma=0\.1to find the optimal hyperplane\. Light Gradient Boosting Machine \(LGBM\) is a fast, gradient\-boosted tree model using 200 estimators, learning rate of 0\.05, and maximum depth of 7\. Artificial Neural Network \(ANN\) with three dense layers \(64, 32, 16 neurons\), ReLU activation, and Adam optimizer \(learning rate 0\.001\) was used to capture nonlinear relationships\. Logistic Regression \(LR\) models the probability of a binary outcome using a logistic function, configured with L2 regularization, regularization strengthC=1\.0C=1\.0, and the ’lbfgs’ solver\.

Among all models, LR performed best, and its performance was further enhanced using advanced swarm intelligence optimization techniques, including Particle Swarm, Ant Colony, Genetic Algorithm, and Harris Hawks Optimization \(HHO\), with HHO\-based LR achieving the highest overall performance\. HHO\-based LR enhances standard LR by integrating HHO to efficiently search the parameter space and identify optimal model weights\. The model estimates the probability of the positive class as:

P\(y=1\|X\)=11\+e−\(𝐗𝐰\+b\)P\(y=1\|X\)=\\frac\{1\}\{1\+e^\{\-\(\\mathbf\{X\}\\mathbf\{w\}\+b\)\}\}\(1\)
Where𝐰\\mathbf\{w\}andbbare the weights and bias\. The HHO algorithm was configured with a population size of 30, maximum iterations of 50, and an escape energy parameterE0=2E\_\{0\}=2, which together maximized predictive performance and effectively handled class imbalance\.

### III\-DEvaluation Metrics for Classification

In binary classification, the accuracy is the general percentage of correct predictions\. Precision is the ratio between the number of predicted positives that are actually positive, whereas recall \(or sensitivity\) represents the number of true positive cases detected by the model\. The F1\-score gives an equilibrium between precision and recall, and the AUC gives the capacity of the model to differentiate the two classes, in which the higher the values, the better the discrimination is\.

### III\-EExplainable AI LIME

XAI assists in making ML models understandable, indicating why they make particular predictions\. We applied LIME in the study, which clarifies individual predictions by pointing to the contribution of each feature, which offers clear insights on the local level that complement global approaches such as SHAP and SHAPASH\[[19](https://arxiv.org/html/2606.24047#bib.bib8)\]\. The LIME explanatory model is represented as follows:

g^\(x\)=argmingϵGL\(f,g,πx´\)\+Ω\(g\)\\hat\{g\}\\left\(x\\right\)=argmin\_\{g\\epsilon G\}L\\left\(f,g,\\pi\_\{x\\acute\{\}\}\\right\)\+\\Omega\\left\(g\\right\)\(2\)
In LIME, the complex modelffis approximated locally by an interpretable modelg^\(x\)∈G\\hat\{g\}\(x\)\\in G, which minimizes the loss functionL\(f,g,πx′\)L\(f,g,\\pi\_\{x^\{\\prime\}\}\)to achieve a balance between fidelity to the original model and interpretability\.

## IVResults and Discussion Analysis

All models were developed and tested using the free version of Google Colab, providing a simple and efficient platform for experimentation\. Models were tuned using 5 fold cross\-validation to optimize performance on the training data\. We compared model performance before and after applying feature selection and correlation analysis to better understand their effects\. To enhance interpretability, LIME was used to identify features that positively or negatively influenced predictions\. The overall process and key findings are outlined step by step below\.

### IV\-ABaseline Model Performance

Table[I](https://arxiv.org/html/2606.24047#S4.T1), the baseline model comparison shows that HHO\-based LR achieved the best overall performance with 93\.06% accuracy, 92\.82% precision, 93\.20% recall, 93\.01% F1\-score, and 0\.94 AUC\. Among standard models, LR performed strongest with 92\.28% accuracy and 0\.93 AUC, followed by ANN \(91\.35%, 0\.92 AUC\) and LGBM \(90\.42%, 0\.91 AUC\)\. k\-NN showed moderate performance \(89\.22%, 0\.89 AUC\), while RFC \(87\.85%, 0\.88 AUC\) and SVC \(87\.36%, 0\.87 AUC\) achieved comparatively lower results\. This highlights the effectiveness of the HHO\-based optimization in improving model performance\.

TABLE I:Baseline Model Performance Comparison for Depression Prediction
### IV\-BPerformance After Feature Selection

After feature selection \(Table[II](https://arxiv.org/html/2606.24047#S4.T2)\), performance improved across all methods, with the proposed ANOVA \+ IG achieving the best results\. HHO\-based LR reached the highest performance with 95\.78% accuracy, 95\.60% precision, 95\.95% recall, 95\.77% F1\-score, and 0\.96 AUC, followed by LR \(94\.54%, 0\.95 AUC\)\. In comparison, Boruta and ensemble \(MI \+ RFE\) methods showed slightly lower performance, with HHO\-based LR achieving up to 94\.31% accuracy and 0\.95 AUC\.

TABLE II:Performance Comparison After Feature SelectionHHO improved recall by 1\.20% and F1\-score by 1\.22% over grid\-search LR; while the absolute gain is modest, this corresponds to approximately∼36\\sim 36additional correct depression detections in the cohort, justifying its use in high\-stakes screening settings where the cost of false negatives outweighs the minimal computational overhead\.

Figure[3](https://arxiv.org/html/2606.24047#S4.F3)compares violence exposure between depressed and non\-depressed groups, showing consistently higher prevalence among the depressed cohort\. Client abuse shows the largest gap \(68\.6% vs\. 44\.5%,Δ=\+24\.0%\\Delta=\+24\.0\\%\), followed by intimate partner abuse \(52\.7% vs\. 39\.5%,Δ=\+13\.2%\\Delta=\+13\.2\\%\)\. Childhood abuse remains highest overall \(92\.6% vs\. 83\.6%,Δ=\+8\.9%\\Delta=\+8\.9\\%\), while police abuse is lower but still elevated \(17\.1% vs\. 10\.6%,Δ=\+6\.6%\\Delta=\+6\.6\\%\)\. On the whole, these findings suggest that depression is closely related to a greater exposure to various types of violence, especially those related to clients and partners\.

![Refer to caption](https://arxiv.org/html/2606.24047v1/Violence_Exposure.png)Figure 3:Violence Exposure Prevalence Among Depressed and Non\-Depressed Groups\.
### IV\-CExplaining Predictions Using LIME

We used LIME tabular and feature importance plots to highlight how individual features influenced the model’s predictions\. To assess generalizability beyond a single case, LIME explanations were aggregated across all 582 test instances\. PTSD outcome and client abuse consistently emerged as the top two predictors in 94\.3% and 87\.1% of cases, respectively, and their importance closely aligned with the ANOVA\+MI ranking \(Spearmanρ=0\.91\\rho=0\.91,p<0\.001p<0\.001\), indicating strong population\-level stability\. The Figure[4](https://arxiv.org/html/2606.24047#S4.F4)presents a tabular explanation of a binary prediction with LIME that gives more preference to class 1 \(0\.57 vs\. 0\.43\), which implies that the model predicts higher risk\. PTSD outcome \(\+0\.27\) is the strongest contributor, followed by recent client abuse \(\+0\.13\) and years in sex work \(\+0\.06\)\. Other factors, including earning potential, client number, location, and province, show smaller effects \(\+0\.02–0\.03\), while the remaining features contribute minimally\. Overall, the prediction is primarily driven by trauma\-related and occupational exposure factors\.

![Refer to caption](https://arxiv.org/html/2606.24047v1/x3.png)Figure 4:LIME Explanation of Key Features Driving Depression Prediction\.Figure[5](https://arxiv.org/html/2606.24047#S4.F5)presents a LIME feature importance plot for a correct prediction \(actual = 1, predicted = 1\)\. The top contributors driving the prediction toward class 1 are PTSD outcome≤\\leq0\.00 \(\+0\.27\), client abuse in the past 12 months \(\+0\.13\), and years worked as a sex worker \(\+0\.06\)\. Moderate positive effects come from earning potential \(\+0\.03\), number of clients \(\+0\.02\), location \(\+0\.02\), and province \(\+0\.02\), while intimate partner abuse \(\+0\.01\) and features like age of first sex and condom use \( 0\.00\) have minimal impact\. Overall, trauma\-related and occupational factors dominate, resulting in a confident classification into the higher\-risk class\.

![Refer to caption](https://arxiv.org/html/2606.24047v1/LIME_2.png)Figure 5:LIME Feature Importance for Correct Depression Prediction\.Past research on mental health prediction for FSWs has shown moderate performance, which is limited by suboptimal feature selection, limited optimization, inability to deal with high dimensional interactions, lack of XAI, and generalization issues\[[5](https://arxiv.org/html/2606.24047#bib.bib24),[17](https://arxiv.org/html/2606.24047#bib.bib13),[18](https://arxiv.org/html/2606.24047#bib.bib22),[6](https://arxiv.org/html/2606.24047#bib.bib21)\]\. Our model, on the other hand, has high performance and is mostly led by clinically important factors such as the factors of PTSD and violence exposure\. Cross validation assists the internal validity but external generalization is untested\. Few differences to the previous work \(Zhang et al\.\[[24](https://arxiv.org/html/2606.24047#bib.bib12)\]; Ndikumana et al\.\[[16](https://arxiv.org/html/2606.24047#bib.bib20)\]\) exist, which is due to the complexity of the datasets and the tasks instead of the superiority of the models\. The hybrid model is an ensemble of ensemble feature selection, HHO\-optimized LR, and XAI, which enhances both the accuracy and interpretability of mental health predictions\. It has a light weight design, so that it can be used on devices without cloud support in real time and on low cost\. The model can be integrated into any REDCap\-based system to support CHWs in making interpretable depression risk screenings and providing referral support\.

## VConclusion and Future Work

The current study suggested a novel hybrid approach for mental health prediction for FSWs based on the combination of ensemble feature selection and HHO\-tuned LR\. The model achieved better performance than baseline models and other optimization\-based models in mental health datasets through the incorporation of the above techniques: ANOVA, MI and swarm intelligence\. Further, LIME increased the interpretability of the findings: they identified important trauma\-related and socioeconomic factors that could influence depression in FSWs, which would allow for more explainable and data\-based mental health interventions\.

Future work will extend the proposed framework through mixed\-methods validation, fairness\-aware evaluation, multi\-country generalization, and real\-world deployment\. Qualitative follow\-up interviews with FSWs will confirm the congruence of the risk factors identified by LIME with the lived experience of FSWs and help to identify other structural determinants for improved modeling\. Future studies will test the framework with cohorts from East Africa and South Asia to address the geographic and selection biases of the current dataset from South Africa, and will also include fairness metrics across different subgroups defined by demographic characteristics and exposure to violence\. The modular HHO\-LR framework can be adapted to other mental health settings, but there is a need for rigorous external validation and benchmarking before claims to broader generalizability can be made\. Future studies will also explore the creation of an offline\-first mHealth system with encrypted on\-device data storage and privacy protection, along with testing in the field to assess usability, cultural acceptability of XAI outputs, and long\-term predictive stability in real\-world public health environments\.

## Data Availability

## References

- \[1\]M\. Abubakkar, K\. S\. Sharif, I\. Ahmad, D\. M\. Tabila, F\. A\. Alsaud, and S\. Debnath\(2025\)Explainable suicide risk prediction with deepfusion: a hybrid intelligence approach\.In2025 4th International Conference on Electronics Representation and Algorithm \(ICERA\),pp\. 455–460\.External Links:[Document](https://dx.doi.org/10.1109/ICERA66156.2025.11087321)Cited by:[§I](https://arxiv.org/html/2606.24047#S1.p3.1)\.
- \[2\]S\. S\. Ayon, A\. Al Mamun, M\. E\. Hossain, W\. Alamro, Y\. M\. Allawi, N\. N\. I\. Prova, M\. S\. U\. Miah, S\. M\. Sultan, and A\. Abadleh\(2026\)Explainable ai framework for improved thalassemia mental health classification and feature selection\.PLoS One21\(1\),pp\. e0341168\.External Links:[Document](https://dx.doi.org/10.1371/journal.pone.0341168)Cited by:[§I](https://arxiv.org/html/2606.24047#S1.p3.1)\.
- \[3\]E\. R\. Bernal\-Monroy, E\. D\. Castañeda\-Monroy, R\. R\. Rentería\-Ramos, S\. E\. Campaña\-Bastidas, J\. Barrera, T\. M\. Palacios\-Yampuezan, O\. L\. González Gustin, C\. F\. Tobar\-Torres, and Z\. R\. Ceballos\-Villada\(2025\)Detection of victimization patterns and risk of gender violence through machine learning algorithms\.InInformatics,Vol\.12,pp\. 21\.Cited by:[§I](https://arxiv.org/html/2606.24047#S1.p3.1)\.
- \[4\]J\. He, J\. Li, S\. Jiang, W\. Cheng, J\. Jiang, Y\. Xu, J\. Yang, X\. Zhou, C\. Chai, and C\. Wu\(2022\)Application of machine learning algorithms in predicting hiv infection among men who have sex with men: model development and validation\.Frontiers in Public Health10,pp\. 967681\.External Links:[Document](https://dx.doi.org/10.3389/fpubh.2022.967681),[Link](https://doi.org/10.3389/fpubh.2022.967681)Cited by:[§II](https://arxiv.org/html/2606.24047#S2.p3.1)\.
- \[5\]R\. Jewkes, M\. Milovanovic, K\. Otwombe, E\. Chirwa, K\. Hlongwane, N\. Hill, V\. Mbowane, M\. Matuludi, K\. Hopkins, G\. Gray, and J\. Coetzee\(2021\)Intersections of sex work, mental ill\-health, ipv and other violence experienced by female sex workers: findings from a cross\-sectional community\-centric national study in south africa\.International Journal of Environmental Research and Public Health18\(22\)\.External Links:[Link](https://www.mdpi.com/1660-4601/18/22/11971),ISSN 1660\-4601,[Document](https://dx.doi.org/10.3390/ijerph182211971)Cited by:[§II](https://arxiv.org/html/2606.24047#S2.p2.1),[§IV\-C](https://arxiv.org/html/2606.24047#S4.SS3.p3.1)\.
- \[6\]M\. R\. Kanchon, J\. Sani, T\. Ahmed,et al\.\(2026\-02\-11\)Interpretable machine learning for predicting early mental health care\-seeking among reproductive\-age women in bangladesh using bdhs 2022 data\.Research Square\.Note:Preprint, Version 1External Links:[Document](https://dx.doi.org/10.21203/rs.3.rs-8595064/v1),[Link](https://doi.org/10.21203/rs.3.rs-8595064/v1)Cited by:[§II](https://arxiv.org/html/2606.24047#S2.p5.1),[§IV\-C](https://arxiv.org/html/2606.24047#S4.SS3.p3.1)\.
- \[7\]G\. Kaya, O\. Kalinowski, F\. Kroehn\-Liedtke, A\. Lotysh, H\. Mihaylova, L\. Zerbe, W\. Rössler, and M\. Schouler\-Ocak\(2025\-11\-13\)The impact of self\-stigmatization on the mental health of female sex workers \(fsws\)\.Frontiers in Public Health13,pp\. 1679876\.Note:Impact Factor: 3\.4, Q1External Links:[Document](https://dx.doi.org/10.3389/fpubh.2025.1679876)Cited by:[§I](https://arxiv.org/html/2606.24047#S1.p1.1)\.
- \[8\]A\. Kebede Kassaw, T\. Melese Yilma, Y\. Sebastian, A\. Yeneneh Birhanu, M\. Sharew Melaku, and S\. Surur Jemal\(2023\)Spatial distribution and machine learning prediction of sexually transmitted infections and associated factors among sexually active men and women in ethiopia, evidence from edhs 2016\.BMC Infectious Diseases23\(1\),pp\. 49\.Cited by:[§I](https://arxiv.org/html/2606.24047#S1.p3.1)\.
- \[9\]F\. Kroehn\-Liedtke, O\. Kalinowski, G\. Kaya, A\. Lotysh, H\. Mihaylova, K\. Sipos, A\. Strunk, L\. Zerbe, W\. Rössler, and M\. Schouler\-Ocak\(2025\)A quantitative study on female sex workers’ mental health in germany\.Frontiers in Public HealthVolume 13 \- 2025\.External Links:[Link](https://www.frontiersin.org/journals/public-health/articles/10.3389/fpubh.2025.1590151),[Document](https://dx.doi.org/10.3389/fpubh.2025.1590151),ISSN 2296\-2565Cited by:[§I](https://arxiv.org/html/2606.24047#S1.p3.1),[§II](https://arxiv.org/html/2606.24047#S2.p4.1)\.
- \[10\]M\. Leis, M\. McDermott, A\. Koziarz, L\. Szadkowski, A\. Kariri, T\. S\. Beattie, R\. Kaul, and J\. Kimani\(2021\)Intimate partner and client\-perpetrated violence are associated with reduced hiv pre\-exposure prophylaxis \(prep\) uptake, depression and generalized anxiety in a cross\-sectional study of female sex workers from nairobi, kenya\.Journal of the international AIDS society24,pp\. e25711\.Cited by:[§II](https://arxiv.org/html/2606.24047#S2.p1.1)\.
- \[11\]M\. T\. Machisa, E\. Chirwa, P\. Mahlangu, N\. Nunze, Y\. Sikweyiya, E\. Dartnall, M\. Pillay, and R\. Jewkes\(2022\)Suicidal thoughts, depression, post\-traumatic stress, and harmful alcohol use associated with intimate partner violence and rape exposures among female students in south africa\.International Journal of Environmental Research and Public Health19\(13\)\.External Links:[Link](https://www.mdpi.com/1660-4601/19/13/7913),ISSN 1660\-4601,[Document](https://dx.doi.org/10.3390/ijerph19137913)Cited by:[§II](https://arxiv.org/html/2606.24047#S2.p2.1)\.
- \[12\]Cited by:[§III\-A](https://arxiv.org/html/2606.24047#S3.SS1.p1.1)\.
- \[13\]L\. Morgan, H\. R\. Welborn, G\. Feist\-Paz,et al\.\(2023\-11\-29\)Mental ill health experiences of female sex workers and their perceived risk factors: a systematic review of qualitative studies\.Research Square\.Note:Preprint, Version 1External Links:[Document](https://dx.doi.org/10.21203/rs.3.rs-3578329/v1),[Link](https://doi.org/10.21203/rs.3.rs-3578329/v1)Cited by:[§II](https://arxiv.org/html/2606.24047#S2.p3.1)\.
- \[14\]A\. Mühlen, J\. Rudy, A\. Böckmann, and D\. Deimel\(2023\)Psychische gesundheit von sexarbeiter\* innen in europa: ein scoping\-review\.Das Gesundheitswesen85\(06\),pp\. 561–567\.Cited by:[§II](https://arxiv.org/html/2606.24047#S2.p1.1)\.
- \[15\]National Alliance on Mental Illness \(NAMI\)\(2025\)Mental health by the numbers\.Note:Accessed: 2026\-04\-16External Links:[Link](https://www.nami.org/mental-health-by-the-numbers/)Cited by:[§I](https://arxiv.org/html/2606.24047#S1.p2.1)\.
- \[16\]F\. Ndikumana, J\. Izabayo, J\. Kalisa, M\. Nemerimana, E\. C\. Nyabyenda, S\. H\. Muzungu, I\. Komezusenge, M\. Uwase, S\. Ndagijimana, C\. Twizere, and V\. Sezibera\(2025\-05\-08\)Machine learning\-based predictive modelling of mental health in rwandan youth\.Scientific Reports15\(1\),pp\. 16032\.Note:Q1, Impact Factor: 3\.9External Links:[Document](https://dx.doi.org/10.1038/s41598-025-00519-z)Cited by:[§II](https://arxiv.org/html/2606.24047#S2.p4.1),[§IV\-C](https://arxiv.org/html/2606.24047#S4.SS3.p3.1)\.
- \[17\]A\. K\. M\. S\. Nethi, M\. Karam, K\. S\. Alvarez, A\. E\. Luque, A\. E\. Nijhawan, E\. Adhikari, and H\. L\. King\(2024\-09\-01\)Using machine learning to identify patients at risk of acquiring hiv in an urban health system\.JAIDS Journal of Acquired Immune Deficiency Syndromes97\(1\),pp\. 40–47\.External Links:[Document](https://dx.doi.org/10.1097/QAI.0000000000003464)Cited by:[§II](https://arxiv.org/html/2606.24047#S2.p3.1),[§IV\-C](https://arxiv.org/html/2606.24047#S4.SS3.p3.1)\.
- \[18\]Y\. Saha and H\. S\(2025\)Dynamic ensemble selection for mental health prediction : a path towards explainable, scalable and high\-impact ai solutions\.In2025 International Conference on Intelligent Computing and Knowledge Extraction \(ICICKE\),Vol\.,pp\. 1–8\.External Links:[Document](https://dx.doi.org/10.1109/ICICKE65317.2025.11136272)Cited by:[§II](https://arxiv.org/html/2606.24047#S2.p5.1),[§IV\-C](https://arxiv.org/html/2606.24047#S4.SS3.p3.1)\.
- \[19\]S\. Siddique Ayon, M\. Ebrahim Hossain, M\. S\. Ullah Miah, M\. M\. Rahman, and M\. Mahmud\(2024\)Explainable ai in feature selection: improving classification performance on imbalanced datasets\.InInternational conference on neural information processing,pp\. 303–318\.External Links:[Document](https://dx.doi.org/10.1007/978-981-96-6606-5%5F21)Cited by:[§III\-E](https://arxiv.org/html/2606.24047#S3.SS5.p1.1)\.
- \[20\]C\. Tomko, R\. J\. Musci, M\. R\. Kaufman, C\. R\. Underwood, M\. R\. Decker, and S\. G\. Sherman\(2023\)Mental health and hiv risk differs by co\-occurring structural vulnerabilities among women who sell sex\.AIDS Care35\(2\),pp\. 205–214\.External Links:[Document](https://dx.doi.org/10.1080/09540121.2022.2121374),[Link](https://doi.org/10.1080/09540121.2022.2121374)Cited by:[§II](https://arxiv.org/html/2606.24047#S2.p1.1)\.
- \[21\]N\. T\. Tutlam, S\. Kizito, P\. Nabunya, M\. Naseh, I\. Nabbosa, I\. Kwesiga, P\. Namatovu, O\. S\. Bahar, N\. Nakasujja, and F\. M\. Ssewamala\(2025\-11\)Social determinants of mental health outcomes among refugee adolescents and youth living with hiv in refugee settlements in uganda: a cross\-sectional analysis\.AIDS and Behavior29\(11\),pp\. 3432–3443\.Note:Impact Factor: 2\.4, Q2\. Epub 2025 Jun 16External Links:[Document](https://dx.doi.org/10.1007/s10461-025-04789-6)Cited by:[§I](https://arxiv.org/html/2606.24047#S1.p2.1)\.
- \[22\]World Health Organization\(2025\)Depressive disorder \(depression\)\.Note:Accessed: 2026\-04\-16External Links:[Link](https://www.who.int/news-room/fact-sheets/detail/depression)Cited by:[§I](https://arxiv.org/html/2606.24047#S1.p1.1)\.
- \[23\]World Health Organization\(2025\)Mental disorders\.Note:Accessed: 2026\-04\-16External Links:[Link](https://www.who.int/news-room/fact-sheets/detail/mental-disorders)Cited by:[§I](https://arxiv.org/html/2606.24047#S1.p1.1)\.
- \[24\]F\. Zhang, S\. Zhu, S\. Chen, Z\. Hao, Y\. Fang, H\. Zou, Y\. Cai, B\. Cao, K\. Zhang, H\. Cao, Y\. Chen, T\. Hu, and Z\. Wang\(2023\)Application of machine learning for risky sexual behavior interventions among factory workers in china\.Frontiers in Public Health11,pp\. 1092018\.External Links:[Document](https://dx.doi.org/10.3389/fpubh.2023.1092018),[Link](https://doi.org/10.3389/fpubh.2023.1092018)Cited by:[§II](https://arxiv.org/html/2606.24047#S2.p3.1),[§IV\-C](https://arxiv.org/html/2606.24047#S4.SS3.p3.1)\.
- \[25\]Z\. Zhang, X\. Chen, S\. Wu, X\. Chen, X\. Wang, C\. Liu, N\. Zeng, Y\. Liu, T\. Huo, X\. Liu,et al\.\(2025\)Global, regional and national burden of anxiety and depression disorders from 1990 to 2021, and forecasts up to 2040\.Journal of Affective Disorders,pp\. 120299\.Cited by:[§I](https://arxiv.org/html/2606.24047#S1.p1.1)\.
Ensemble Feature Selection and Harris Hawks Optimization for Explainable Mental Health Risk Prediction in Female Sex Workers

Similar Articles

TRIAGE: Dialectical Reasoning for Explainable Risk Prediction on Irregularly Sampled Medical Time Series with LLMs

InfoShield: Privacy-Preserving Speech Representations for Mental Health Screening via Information-Theoretic Optimization

Comparing Post-Hoc Explainable AI Methods for Interpreting Black-Box EEG Models in Depression Detection

DreamerNLplus: Interpretable Modeling of Mental Health Dynamics from Social Media Timelines using Hybrid Rule-Based and RAG Methods

Comparative Evaluation of Machine Learning Approaches for Minority-Class Financial Distress Prediction Under Class Imbalance Constraints

Submit Feedback

Similar Articles

TRIAGE: Dialectical Reasoning for Explainable Risk Prediction on Irregularly Sampled Medical Time Series with LLMs
InfoShield: Privacy-Preserving Speech Representations for Mental Health Screening via Information-Theoretic Optimization
Comparing Post-Hoc Explainable AI Methods for Interpreting Black-Box EEG Models in Depression Detection
DreamerNLplus: Interpretable Modeling of Mental Health Dynamics from Social Media Timelines using Hybrid Rule-Based and RAG Methods
Comparative Evaluation of Machine Learning Approaches for Minority-Class Financial Distress Prediction Under Class Imbalance Constraints