low-resource

#low-resource

When a Name Is Not a Name: A Benchmark Dataset and Distilled Reasoning for Culturally Entangled Bangla Homographs in Low-Resource LLMs

arXiv cs.CL ↗ · yesterday Cached

This paper introduces a benchmark dataset of 1,516 expert-verified Bangla sentences for disambiguating culturally entangled homographs (words that are both names and common nouns). It shows that LLMs suffer from dominant-meaning bias and proposes contrastive chain-of-thought prompting and distillation to reduce this bias.

0 favorites 0 likes

#low-resource

Safety That Does Not Transfer: Cross-Lingual Clinical Correctness Drift in Deployable Medical Language Models

arXiv cs.CL ↗ · yesterday Cached

This paper investigates cross-lingual clinical correctness drift in medical language models, finding that locally deployable models show significant safety degradation when queried in Hausa compared to English, while frontier models maintain competence, highlighting a critical gap in safety evaluation for low-resource settings.

0 favorites 0 likes

#low-resource

BLAD: A Historically Contextualized, Multilingual Dataset of Bangladeshi Legal Acts (1799 to 2025)

arXiv cs.CL ↗ · yesterday Cached

This paper introduces BLAD, a curated multilingual dataset of 1,484 Bangladeshi legal acts spanning 1799 to 2025, with structured metadata for temporal and cross-lingual legal NLP research.

0 favorites 0 likes

#low-resource

Hybrid Continual Learning for Low-Resource Australian Aboriginal Language Identification

arXiv cs.CL ↗ · 2026-07-15 Cached

This paper proposes two hybrid continual learning methods—Replay Augmented Elastic Weight Consolidation and Constraint Guided Knowledge Distillation—to adapt pretrained speech models for identifying low-resource Australian Aboriginal languages while mitigating catastrophic forgetting.

0 favorites 0 likes

#low-resource

Polarization Detection: A Hybrid Approach with AfroXLMR-Social and DeBERTa for Low- and High-Resource Settings

arXiv cs.CL ↗ · 2026-07-14 Cached

This paper presents a hybrid approach for detecting online polarization in English and Hausa using DeBERTa for English and AfroXLMR-Social for Hausa and fine-grained subtasks, with LoRA and data augmentation to address computational and data constraints.

0 favorites 0 likes

#low-resource

Which Languages Transfer Best to Warlpiri? A Similarity-Based Study for Low-Resource ASR

arXiv cs.CL ↗ · 2026-07-14 Cached

This paper investigates cross-lingual transfer for low-resource ASR in Warlpiri, proposing a similarity-based framework combining acoustic and linguistic features to select optimal source languages. Experiments show that acoustically similar languages like Assamese and Hindi significantly reduce word and character error rates.

0 favorites 0 likes

#low-resource

Toward Real-Time Sentence-Level Sign Language Translation

arXiv cs.CL ↗ · 2026-07-13 Cached

This paper presents a sentence-level sign language translation system fine-tuned with QLoRA on a subset of How2Sign, achieving BLEU 15.9. Its main contribution is a hardware-aware streaming pipeline using a Raspberry Pi 4B client and a CPU/GPU backend, reducing mean latency by 27.71%.

0 favorites 0 likes

#low-resource

Multi-Conditioned Diffusion Synthesis of Sand Boils for Low-Resource Earthen-Levee Inspection

arXiv cs.AI ↗ · 2026-07-13 Cached

This paper proposes a multi-conditioned diffusion-based synthesis pipeline using Stable Diffusion XL and ControlNet to generate synthetic sand boil imagery for low-resource earthen-levee inspection, addressing the scarcity of annotated defect examples.

0 favorites 0 likes

#low-resource

@GPTWare: Uhhhh WTF is this???

X AI KOLs Timeline ↗ · 2026-07-11 Cached

Colibri runs the 744B parameter GLM-5.2 MoE model on a laptop with 25GB RAM by activating only ~40B parameters per token and streaming experts from disk, all in a single 2,400-line C file with no GPU required.

0 favorites 0 likes

#low-resource

Nigeria Machinery: A Low-Resource Industrial Dataset with a Domain-Grounded Reasoning Layer

arXiv cs.AI ↗ · 2026-07-10 Cached

This paper introduces the Nigeria Machinery Usage and Failures Dataset, 89 records across 28 indicators for Nigeria's manufacturing and oil/gas sectors from 2006 to 2025, along with a method to build domain-grounded chain-of-thought reasoning examples from sparse numeric values.

0 favorites 0 likes

#low-resource

From Sinhala to Dhivehi: Cross-Lingual Transfer Learning for Low-Resource Speech Recognition

arXiv cs.CL ↗ · 2026-07-08 Cached

This research investigates cross-lingual transfer learning from Sinhala to Dhivehi for automatic speech recognition, achieving significant improvements in word error rate compared to Dhivehi-only baselines.

0 favorites 0 likes

#low-resource

CoPiT: Cognitive Pivot Translation for Digraphic Low-Resource Mongolian in the Traditional Script

arXiv cs.CL ↗ · 2026-07-08 Cached

This paper proposes CoPiT, a cognitively motivated pivot-based translation pipeline for digraphic Mongolian that routes translation through the better-resourced Cyrillic script to improve translation from the low-resource Traditional script, achieving significant BLEU and COMET gains and releasing a new multi-script parallel dataset.

0 favorites 0 likes

#low-resource

Jointly Improving Dialect Identification and ASR in Indian Languages using Multimodal Feature Fusion

arXiv cs.CL ↗ · 2026-07-07 Cached

This paper proposes a multimodal framework that jointly improves Automatic Speech Recognition (ASR) and Dialect Identification (DID) for Indian languages, using a Bottleneck Encoder and RoBERTa with a gating mechanism. Evaluated on eight languages with 33 dialects, it achieves 81.63% DID accuracy and reduces CER/WER to 4.65%/17.73%.

0 favorites 0 likes

#low-resource

Small AI Models Gain Traction In places with unreliable networks

Hacker News Top ↗ · 2026-07-06 Cached

Small AI models are proving valuable in regions with unreliable networks, enabling life-saving applications like counterfeit drug detection and disease identification in crops without needing constant internet connectivity.

0 favorites 0 likes

#low-resource

PAST-TIDE: Prototype-Anchored Statement Tuning with Topic-Invariant Normalization for Stance Detection

Hugging Face Daily Papers ↗ · 2026-07-06 Cached

PAST-TIDE is a stance detection system for the StanceNakba Shared Task, using statement tuning with cloze-style masked language modeling, prototypical contrastive learning, and topic-conditional layer normalization for cross-topic Arabic stance detection, achieving macro-F1 scores of 0.75 and 0.74 on subtasks A and B.

0 favorites 0 likes

#low-resource

I built an open, from-scratch MT pipeline + parallel corpus for Tunisian Darija (Arabizi) early baseline, and I'm growing it into a curated community corpus [P]

Reddit r/MachineLearning ↗ · 2026-07-05

An 18-year-old Tunisian student introduces an open-source machine translation pipeline and parallel corpus for Tunisian Darija in Arabizi script, built from scratch with a small 15.6M-parameter Transformer and an honest baseline BLEU of 3.89, and calls for contributors to ethically expand the corpus.

0 favorites 0 likes

#low-resource

Challenges and Recommendations for LLMs-as-a-Judge in Multilingual Settings and Low-Resource Languages

arXiv cs.CL ↗ · 2026-07-03 Cached

This paper analyzes the use of LLM-as-a-Judge in multilingual and low-resource settings, finding inconsistent evaluation outcomes and overtrust in LLM judgments, and provides recommendations for better practices.

0 favorites 0 likes

#low-resource

SPARCLE: SPeaker-aware Aligned Representations via Contrastive Language Embeddings

arXiv cs.CL ↗ · 2026-07-03 Cached

SPARCLE is a speaker-aware grapheme representation model that uses contrastive learning to align grapheme embeddings with acoustic representations, improving text-to-speech quality especially in low-resource settings.

0 favorites 0 likes

#low-resource

Cross-lingual Relation Extraction with Large Language Models: Zero-Shot, Few-Shot, and Fine-Tuned Evaluation on Romanian

arXiv cs.CL ↗ · 2026-07-01 Cached

This paper investigates cross-lingual relation extraction for Romanian by translating the SemEval-2010 Task 8 benchmark and evaluating Gemma 4 under zero-shot, few-shot, and QLoRA fine-tuning, comparing with smaller encoder baselines.

0 favorites 0 likes

#low-resource

Tone-Conditioned Curriculum Learning for Low-Resource Bantu Speech Recognition

arXiv cs.CL ↗ · 2026-07-01 Cached

This paper proposes a tone-conditioned curriculum learning framework for low-resource Bantu speech recognition, combining hybrid difficulty scoring, gated adapters, and staged curriculum training. Evaluations on six Southern Bantu languages show that W2V-BERT outperforms Whisper on Nguni languages while Whisper performs better on Sotho-Tswana languages.

0 favorites 0 likes

low-resource

Submit Feedback