Development and Preliminary Evaluation of a Domain-Specific Large Language Model for Tuberculosis Care in South Africa

arXiv cs.CL Papers

Summary

Researchers fine-tuned BioMistral-7B with QLoRA and GraphRAG to create a TB-care LLM for South Africa, showing improved contextual alignment over the base model.

arXiv:2604.19776v1 Announce Type: new Abstract: Tuberculosis (TB) is one of the world's deadliest infectious diseases, and in South Africa, it contributes a significant burden to the country's health care system. This paper presents an experimental study on the development of a domain-specific Large Language Model (DS-LLM) for TB care that can help to alleviate the burden on patients and healthcare providers. To achieve this, a literature review was conducted to understand current LLM development strategies, specifically in the medical domain. Thereafter, data were collected from South African TB guidelines, selected TB literature, and existing benchmark medical datasets. We performed LLM fine-tuning by using the Quantised Low-Rank Adaptation (QLoRA) algorithm on a medical LLM (BioMistral-7B), and also implemented Retrieval-Augmented Generation using GraphRAG. The developed DS-LLM was evaluated against the base BioMistral-7B model and a general-purpose LLM using a mix of automated metrics and quantitative ratings. The results show that the DS-LLM had better performance compared to the base model in terms of its contextual alignment (lexical, semantic, and knowledge) for TB care in South Africa.
Original Article Export to Word Export to PDF
View Cached Full Text

Cached at: 04/23/26, 10:02 AM

# Development and Preliminary Evaluation of a Domain-Specific Large Language Model for Tuberculosis Care in South Africa
Source: [https://arxiv.org/abs/2604.19776](https://arxiv.org/abs/2604.19776)
[View PDF](https://arxiv.org/pdf/2604.19776)

> Abstract:Tuberculosis \(TB\) is one of the world's deadliest infectious diseases, and in South Africa, it contributes a significant burden to the country's health care system\. This paper presents an experimental study on the development of a domain\-specific Large Language Model \(DS\-LLM\) for TB care that can help to alleviate the burden on patients and healthcare providers\. To achieve this, a literature review was conducted to understand current LLM development strategies, specifically in the medical domain\. Thereafter, data were collected from South African TB guidelines, selected TB literature, and existing benchmark medical datasets\. We performed LLM fine\-tuning by using the Quantised Low\-Rank Adaptation \(QLoRA\) algorithm on a medical LLM \(BioMistral\-7B\), and also implemented Retrieval\-Augmented Generation using GraphRAG\. The developed DS\-LLM was evaluated against the base BioMistral\-7B model and a general\-purpose LLM using a mix of automated metrics and quantitative ratings\. The results show that the DS\-LLM had better performance compared to the base model in terms of its contextual alignment \(lexical, semantic, and knowledge\) for TB care in South Africa\.

## Submission history

From: Olawande Daramola Prof\. \[[view email](https://arxiv.org/show-email/a3ae5965/2604.19776)\] **\[v1\]**Sat, 28 Mar 2026 11:22:05 UTC \(651 KB\)

Similar Articles

Can Language Models Identify Side Effects of Breast Cancer Radiation Treatments?

arXiv cs.CL

This paper presents a deployment-oriented stress-testing framework to evaluate how well large language models identify side effects of breast cancer radiation treatments. The study highlights limitations in LLM reliability, such as sensitivity to minor documentation changes and under-recall of rare side effects, suggesting that grounding outputs in clinician-curated lists improves robustness.

MedAction: Towards Active Multi-turn Clinical Diagnostic LLMs

arXiv cs.CL

This paper introduces MedAction, a framework for training LLMs on active, multi-turn clinical diagnosis by simulating iterative test ordering and hypothesis updates. It presents a new dataset, MedAction-32K, and demonstrates state-of-the-art performance for open-source models on medical benchmarks.

Injecting Structured Biomedical Knowledge into Language Models: Continual Pretraining vs. GraphRAG

arXiv cs.CL

This paper compares two strategies for injecting structured biomedical knowledge from the UMLS Metathesaurus into language models: continual pretraining (embedding knowledge into model parameters) and GraphRAG (querying a knowledge graph at inference time). Results show improvements on biomedical QA benchmarks, with GraphRAG on LLaMA 3-8B yielding over 3 and 5 accuracy points on PubMedQA and BioASQ respectively without any retraining.