Explainable Ensemble-Based Machine Learning Models for Detecting the Presence of Cirrhosis in Hepatitis C Patients
Summary
This paper applies ensemble machine learning models (Random Forest, Gradient Boosting, XGBoost, Extra Trees) to detect cirrhosis in hepatitis C patients using 28 features from 2038 Egyptian patients. The Extra Trees model achieved 96.92% accuracy with only 16 features, outperforming other models.
View Cached Full Text
Cached at: 06/26/26, 05:13 AM
# Explainable Ensemble-Based Machine Learning Models for Detecting the Presence of Cirrhosis in Hepatitis C Patients Source: [https://arxiv.org/abs/2606.26561](https://arxiv.org/abs/2606.26561) [View PDF](https://arxiv.org/pdf/2606.26561) > Abstract:Hepatitis C is a liver infection caused by a virus, which results in mild to severe inflammation of the liver\. Over many years, hepatitis C gradually damages the liver, often leading to permanent scarring, known as cirrhosis\. Patients sometimes have moderate or no symptoms of liver illness for decades before developing cirrhosis\. Cirrhosis typically worsens to the point of liver failure\. Patients with cirrhosis may also experience brain and nerve system damage, as well as gastrointestinal hemorrhage\. Treatment for cirrhosis focuses on preventing further progression of the disease\. Detecting cirrhosis earlier is therefore crucial for avoiding complications\. Machine learning \(ML\) has been shown to be effective at providing precise and accurate information for use in diagnosing several diseases\. Despite this, no studies have so far used ML to detect cirrhosis in patients with hepatitis C\. This study obtained a dataset consisting of 28 attributes of 2038 Egyptian patients from the ML Repository of the University of California at Irvine\. Four ML algorithms were trained on the dataset to diagnose cirrhosis in hepatitis C patients: a Random Forest, a Gradient Boosting Machine, an Extreme Gradient Boosting, and an Extra Trees model\. The Extra Trees model outperformed the other models achieving an accuracy of 96\.92%, a recall of 94\.00%, a precision of 99\.81%, and an area under the receiver operating characteristic curve of 96% using only 16 of the 28 features\. ## Submission history From: Abrar Alotaibi \[[view email](https://arxiv.org/show-email/9eb85a9f/2606.26561)\] **\[v1\]**Thu, 25 Jun 2026 03:13:37 UTC \(3,313 KB\)
Similar Articles
Conformal Risk Prediction for Non-Alcoholic Fatty Liver Disease Using Gradient Boosting with Distribution-Free Coverages
This paper presents LiverRisk, a machine learning framework for NAFLD risk prediction that combines gradient-boosted decision trees with conformal prediction to provide calibrated, distribution-free coverage guarantees on individual risk estimates, achieving high AUROC on internal and external cohorts.
Early Detection of Alzheimer's Disease Using Explainable Machine Learning on Clinical Biomarkers: A Multi-Class Classification Study Using the Alzheimer's Disease Neuroimaging Initiative (ADNI) Dataset
This study develops an XGBoost classifier using SHAP explainability on eight clinical biomarkers from the ADNI dataset to achieve three-class Alzheimer's disease detection (normal cognition, MCI, AD), reaching a macro AUC of 0.982 and Cohen's kappa of 0.909 on the held-out test set. SHAP analysis identifies CDR Global as the dominant predictor for NC and MCI, while CDR-SB and MMSE together drive AD classification.
Leveraging Physiological Signals to Predict Exam Outcomes with Machine Learning
This study investigates machine learning models to predict exam outcomes using physiological data such as electrodermal activity, heart rate, and skin temperature, finding that both deep learning approaches and simpler models like random forests can be effective.
Multi-Modal Machine Learning for Breast Cancer Recurrence Prediction
This paper examines the integration of multi-modal clinical data, including treatment records, pathology reports, and clinician notes, using rule-based extraction and machine learning to improve breast cancer recurrence prediction compared to single-modal approaches.
Machine learning prediction of obstructive coronary artery disease using opportunistic coronary calcium and epicardial fat assessments from CT calcium scoring scans
This paper presents a machine learning framework using CatBoost and SHAP to predict obstructive coronary artery disease from CT calcium scoring scans, achieving high accuracy by combining calcium-omics and epicardial fat features.