Tag
Proposes G-SPIN, a lightweight framework that combines phonetic graph modeling with contextual language understanding for correcting ASR errors, using a GNN to generate phonetically plausible candidate tokens, an MLM for local scoring, and an LLM for final re-ranking, all operating at inference time.
This paper proposes a dialect-aware phonetic framework for modeling phonetic variation in Vietnamese ASR, decomposing syllables into structured components and mapping them to dialect-specific IPA representations. The approach matches pretrained baselines with fewer parameters and no external pretraining on the UIT-ViMD multi-dialect dataset.