Development and Preliminary Evaluation of a Domain-Specific Large Language Model for Tuberculosis Care in South Africa

arXiv cs.CL 04/23/26, 04:00 AM Papers

tuberculosis domain-specific-llm biomedical-ai south-africa qlora graphrag

Summary

Researchers fine-tuned BioMistral-7B with QLoRA and GraphRAG to create a TB-care LLM for South Africa, showing improved contextual alignment over the base model.

arXiv:2604.19776v1 Announce Type: new Abstract: Tuberculosis (TB) is one of the world's deadliest infectious diseases, and in South Africa, it contributes a significant burden to the country's health care system. This paper presents an experimental study on the development of a domain-specific Large Language Model (DS-LLM) for TB care that can help to alleviate the burden on patients and healthcare providers. To achieve this, a literature review was conducted to understand current LLM development strategies, specifically in the medical domain. Thereafter, data were collected from South African TB guidelines, selected TB literature, and existing benchmark medical datasets. We performed LLM fine-tuning by using the Quantised Low-Rank Adaptation (QLoRA) algorithm on a medical LLM (BioMistral-7B), and also implemented Retrieval-Augmented Generation using GraphRAG. The developed DS-LLM was evaluated against the base BioMistral-7B model and a general-purpose LLM using a mix of automated metrics and quantitative ratings. The results show that the DS-LLM had better performance compared to the base model in terms of its contextual alignment (lexical, semantic, and knowledge) for TB care in South Africa.

Original Article Export to Word Export to PDF

View Cached Full Text

Cached at: 04/23/26, 10:02 AM

# Development and Preliminary Evaluation of a Domain-Specific Large Language Model for Tuberculosis Care in South Africa
Source: [https://arxiv.org/abs/2604.19776](https://arxiv.org/abs/2604.19776)
[View PDF](https://arxiv.org/pdf/2604.19776)

> Abstract:Tuberculosis \(TB\) is one of the world's deadliest infectious diseases, and in South Africa, it contributes a significant burden to the country's health care system\. This paper presents an experimental study on the development of a domain\-specific Large Language Model \(DS\-LLM\) for TB care that can help to alleviate the burden on patients and healthcare providers\. To achieve this, a literature review was conducted to understand current LLM development strategies, specifically in the medical domain\. Thereafter, data were collected from South African TB guidelines, selected TB literature, and existing benchmark medical datasets\. We performed LLM fine\-tuning by using the Quantised Low\-Rank Adaptation \(QLoRA\) algorithm on a medical LLM \(BioMistral\-7B\), and also implemented Retrieval\-Augmented Generation using GraphRAG\. The developed DS\-LLM was evaluated against the base BioMistral\-7B model and a general\-purpose LLM using a mix of automated metrics and quantitative ratings\. The results show that the DS\-LLM had better performance compared to the base model in terms of its contextual alignment \(lexical, semantic, and knowledge\) for TB care in South Africa\.

## Submission history

From: Olawande Daramola Prof\. \[[view email](https://arxiv.org/show-email/a3ae5965/2604.19776)\] **\[v1\]**Sat, 28 Mar 2026 11:22:05 UTC \(651 KB\)

Development and Preliminary Evaluation of a Domain-Specific Large Language Model for Tuberculosis Care in South Africa

Similar Articles

@cjzafir: VLMs (Vertical Language Models) are beating top LLMs. These small 7B to 15B niche-focused models are beating SoTA model…

Can Language Models Identify Side Effects of Breast Cancer Radiation Treatments?

LLiMba: Sardinian on a Single GPU -- Adapting a 3B Language Model to a Vanishing Romance Language

MedAction: Towards Active Multi-turn Clinical Diagnostic LLMs

Injecting Structured Biomedical Knowledge into Language Models: Continual Pretraining vs. GraphRAG

Submit Feedback

Similar Articles

@cjzafir: VLMs (Vertical Language Models) are beating top LLMs. These small 7B to 15B niche-focused models are beating SoTA model…

Can Language Models Identify Side Effects of Breast Cancer Radiation Treatments?

LLiMba: Sardinian on a Single GPU -- Adapting a 3B Language Model to a Vanishing Romance Language

MedAction: Towards Active Multi-turn Clinical Diagnostic LLMs

Injecting Structured Biomedical Knowledge into Language Models: Continual Pretraining vs. GraphRAG