low-resource

#low-resource

Enhancing BiGRU with a KAN Block for Legal Document Classification and Summarization

arXiv cs.CL ↗ · 2d ago Cached

This paper introduces a KAN-enhanced BiGRU architecture for classifying and summarizing multilingual legal documents from Bangladesh, achieving modest accuracy and ROUGE scores and demonstrating that the KAN block improves classification accuracy over the baseline BiGRU.

0 favorites 0 likes

#low-resource

Adapting Multilingual Embedding Models to Turkish via Cross-Lingual Tokenizer Surgery and Offline Distillation

Hugging Face Daily Papers ↗ · 2026-05-28 Cached

This paper presents embeddingmagibu-200m, a Turkish-focused sentence embedding model built via cross-lingual tokenizer surgery and offline distillation, achieving strong performance on Turkish benchmarks with a cost-quality trade-off.

0 favorites 0 likes

#low-resource

Phonetic Modeling of Dialectal Variation in Vietnamese Speech

arXiv cs.CL ↗ · 2026-05-26 Cached

This paper proposes a dialect-aware phonetic framework for modeling phonetic variation in Vietnamese ASR, decomposing syllables into structured components and mapping them to dialect-specific IPA representations. The approach matches pretrained baselines with fewer parameters and no external pretraining on the UIT-ViMD multi-dialect dataset.

0 favorites 0 likes

#low-resource

Lost in Translation? Exploring the Shift in Grammatical Gender from Latin to Occitan

Hugging Face Daily Papers ↗ · 2026-05-26 Cached

A deep learning framework is developed to analyze grammatical gender evolution from Latin to Romance languages, focusing on low-resource historical settings using lexical and contextual analysis.

0 favorites 0 likes

#low-resource

A Reproducible Universal Dependencies-Style Pipeline for Katharevousa Greek Parliamentary Text

arXiv cs.CL ↗ · 2026-05-25 Cached

This paper presents a reproducible pipeline for building Universal Dependencies-style parsing resources for Katharevousa Greek parliamentary text, including OCR reconstruction, LLM-assisted annotation, and evaluation of multiple parsers. The best model (XLM-R) achieves 0.8893 UPOS accuracy and 0.5162 LAS, significantly outperforming off-the-shelf baselines.

0 favorites 0 likes

#low-resource

Knowledge Distillation for Low-Resource Open-source Text-to-SQL Model

arXiv cs.CL ↗ · 2026-05-25 Cached

This paper proposes a knowledge-aware Text-to-SQL framework that uses knowledge distillation to improve performance in low-resource settings by constructing task-specific knowledge bases and generating synthetic training data. Experiments on seven benchmarks show substantial improvements, especially for open-source models.

0 favorites 0 likes

#low-resource

Polite on the Surface, Wrong in Practice: A Curated Dataset for Fixing Honorific Failures in Multilingual Bangla Generation

arXiv cs.CL ↗ · 2026-05-22 Cached

This paper introduces BLADE, a culturally aligned instruction-tuning dataset of 4,196 interaction pairs for fixing honorific failures and pragmatic gaps in multilingual Bangla generation. Fine-tuning models like DeepSeek-8B and LLaMA-3.2-3B on this dataset yields substantial improvements in structural fidelity and honorific alignment.

0 favorites 0 likes

#low-resource

Convex Low-resource Accent-Robust Language Detection in Speech Recognition

Hugging Face Daily Papers ↗ · 2026-05-22 Cached

This paper introduces CLD, a lightweight convex optimization-based language detection head for ASR that achieves 97-98% accuracy with under 100 training samples while reducing compute costs by 13x, addressing accent and dialect robustness across 5 languages and 24 sub-dialects.

0 favorites 0 likes

#low-resource

Divide-Prompt-Refine: a Training-Free, Structure-Aware Framework for Biomedical Abstract Generation

arXiv cs.CL ↗ · 2026-05-21 Cached

DPR-BAG is a training-free, zero-shot framework that generates coherent biomedical abstracts from full-text articles by decomposing them into rhetorical facets, summarizing each with an LLM, and refining for coherence, achieving better novelty than baselines while maintaining factual consistency.

0 favorites 0 likes

#low-resource

Improving Quantized Model Performance in Qualitative Analysis with Multi-Pass Prompt Verification

arXiv cs.CL ↗ · 2026-05-21 Cached

This paper proposes a multi-pass prompt verification method to improve the performance of quantized LLMs (LLaMA-3.1 8B) in qualitative analysis, reducing hallucinations and increasing stability across different quantization levels (8-bit, 4-bit, 3-bit, 2-bit).

0 favorites 0 likes

#low-resource

The Annotation Scarcity Paradox in Low-Resource NLP Evaluation: A Decade of Acceleration and Emerging Constraints

arXiv cs.CL ↗ · 2026-05-20 Cached

This critical survey examines the Annotation Scarcity Paradox in low-resource NLP evaluation, where rapid model scaling outpaces the human infrastructure needed for authentic evaluation, and discusses emerging responses with equity and validity trade-offs.

0 favorites 0 likes

#low-resource

Multilingual and Multimodal LLMs in the Wild: Building for Low-Resource Languages

arXiv cs.CL ↗ · 2026-05-19 Cached

This tutorial paper provides an overview of building multilingual and multimodal LLMs for low-resource languages, covering data creation, model alignment, fine-tuning, and evaluation, with a focus on practical recipes and hands-on resources.

0 favorites 0 likes

#low-resource

Adesua: Development and Feasibility Study of an AI WhatsApp Bot for Science Learning in West Africa

arXiv cs.CL ↗ · 2026-05-18 Cached

This paper presents Adesua, a WhatsApp-based AI teaching assistant for science education in West Africa, integrating retrieval-augmented generation with curated textbooks and exam questions. A 6-month feasibility study in Ghana showed high perceived usefulness (93.75% helpfulness) but with a small sample size.

0 favorites 0 likes

#low-resource

Mitigating Data Scarcity in Psychological Defense Classification with Context-Aware Synthetic Augmentation

arXiv cs.CL ↗ · 2026-05-15 Cached

This paper proposes a context-aware synthetic augmentation framework combined with a hybrid classification model to address data scarcity and class imbalance in classifying psychological defense mechanisms from text. The method achieves significant improvements on the PsyDefDetect shared task benchmark.

0 favorites 0 likes

low-resource

Submit Feedback