IndicMedDialog: A Parallel Multi-Turn Medical Dialogue Dataset for Accessible Healthcare in Indic Languages

Hugging Face Daily Papers 05/13/26, 12:00 AM Papers

medical-dialogue multi-turn synthetic-data multilingual indic-languages fine-tuning dataset

Summary

IndicMedDialog is a parallel multi-turn medical dialogue dataset spanning English and nine Indic languages, with a fine-tuned model for personalized symptom elicitation. The dataset is derived from MDDial, enhanced with LLM-generated synthetic consultations and expert verification, supporting multilingual healthcare AI.

Most existing medical dialogue systems operate in a single-turn question--answering paradigm or rely on template-based datasets, limiting conversational realism and multilingual applicability. We introduce IndicMedDialog, a parallel multi-turn medical dialogue dataset spanning English and nine Indic languages: Assamese, Bengali, Gujarati, Hindi, Marathi, Punjabi, Tamil, Telugu, and Urdu. The dataset extends MDDial with LLM-generated synthetic consultations, translated using TranslateGemma, verified by native speakers, and refined through a script-aware post-processing pipeline to correct phonetic, lexical, and character-spacing errors. Building on this dataset, we fine-tune IndicMedLM via parameter-efficient adaptation of a quantized small language model, incorporating optional patient pre-context to personalise multi-turn symptom elicitation. We evaluate against zero-shot multilingual baselines, conduct systematic error analysis across ten languages, and validate clinical plausibility through medical expert evaluation.

Original Article

View Cached Full Text

Cached at: 05/14/26, 08:20 PM

Paper page - IndicMedDialog: A Parallel Multi-Turn Medical Dialogue Dataset for Accessible Healthcare in Indic Languages

Source: https://huggingface.co/papers/2605.13292

Abstract

A parallel multi-turn medical dialogue dataset spanning English and nine Indic languages is introduced, along with a fine-tuned model using parameter-efficient adaptation for personalized symptom elicitation.

Most existing medical dialogue systems operate in a single-turn question--answering paradigm or rely on template-based datasets, limiting conversational realism and multilingual applicability. We introduce IndicMedDialog, a parallel multi-turn medical dialogue dataset spanning English and nine Indic languages: Assamese, Bengali, Gujarati, Hindi, Marathi, Punjabi, Tamil, Telugu, and Urdu. The dataset extends MDDial with LLM-generatedsynthetic consultations, translated usingTranslateGemma, verified by native speakers, and refined through a script-awarepost-processing pipelineto correct phonetic, lexical, and character-spacing errors. Building on this dataset, we fine-tune IndicMedLM viaparameter-efficient adaptationof a quantized small language model, incorporating optional patient pre-context to personalise multi-turn symptom elicitation. We evaluate againstzero-shot multilingualbaselines, conductsystematic error analysisacross ten languages, and validateclinical plausibilitythrough medical expert evaluation.

View arXiv page View PDF GitHub1 Add to collection

Get this paper in your agent:

hf papers read 2605\.13292

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2605.13292 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2605.13292 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2605.13292 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

IndicMedDialog: A Parallel Multi-Turn Medical Dialogue Dataset for Accessible Healthcare in Indic Languages

Paper page - IndicMedDialog: A Parallel Multi-Turn Medical Dialogue Dataset for Accessible Healthcare in Indic Languages

Abstract

Models citing this paper0

Datasets citing this paper0

Spaces citing this paper0

Collections including this paper0

Similar Articles

A Dataset of Robot-Patient and Doctor-Patient Medical Dialogues for Spoken Language Processing Tasks

Synthesis and Evaluation of Long-term History-aware Medical Dialogue

MedAction: Towards Active Multi-turn Clinical Diagnostic LLMs

HiMed: Incentivizing Hindi Reasoning in Medical LLMs

Human-LLM Dialogue Improves Diagnostic Accuracy in Emergency Care

Submit Feedback

Similar Articles

A Dataset of Robot-Patient and Doctor-Patient Medical Dialogues for Spoken Language Processing Tasks

Synthesis and Evaluation of Long-term History-aware Medical Dialogue

MedAction: Towards Active Multi-turn Clinical Diagnostic LLMs

HiMed: Incentivizing Hindi Reasoning in Medical LLMs

Human-LLM Dialogue Improves Diagnostic Accuracy in Emergency Care