Tag
Angeliki Giannou, co-inventor of Looped Transformers, has successfully defended her PhD thesis and is set to begin a new role. Congratulations were shared by Dimitris Papailiopoulos on social media.
This paper presents a large-scale analysis of four harmful language detection datasets, examining how annotator characteristics and linguistic features interact to influence annotation variation. It highlights intersectional effects and warns against generalizing findings across different datasets.
This paper details the YEZE system for SemEval-2026 Task 9, which detects online polarization in 22 languages using a heterogeneous ensemble of XLM-RoBERTa and mDeBERTa models.
This paper introduces IRC-Bench, a benchmark for recognizing implicit entities in first-person reminiscences using contextual cues rather than explicit mentions. It evaluates various LLM and retrieval configurations, finding QLoRA-adapted Llama 3.1 8B to be the top performer in open-world settings.
This paper proposes an evidence-based model to automatically generate query keywords from query-free summarization datasets, enabling the creation of query-focused summarization datasets. Experimental results show that summaries generated using evidence-based queries achieve competitive ROUGE scores compared to original queries.
A researcher named HongcanGuo teases a brand-new approach to text modeling, but the tweet provides no technical details.
Hugging Face has released version 5.8.0 of the Transformers library, a widely used open-source framework for natural language processing and deep learning.
Researchers develop KokborokMT, a neural MT system for the low-resource Kokborok language, achieving BLEU scores of 17.30 en→trp and 38.56 trp→en by fine-tuning NLLB-200 on a 36k-sentence parallel corpus.
Hugging Face released version 5.6.0 of its popular transformers library.
This paper proposes Product-of-Experts (PoE) training to reduce dataset artifacts in Natural Language Inference, downweighting examples where biased models are overconfident. PoE nearly preserves accuracy on SNLI (89.10% vs. 89.30%) while reducing bias reliance by ~4.85 percentage points.
University of Memphis researchers propose HAMR, a model-agnostic meta-learning framework that uses bi-level optimization and neighborhood-aware resampling to adaptively reweight hard examples and minority classes across six imbalanced NLP datasets.
Researchers from National Taiwan University propose replacing fixed translation-based prompting strategies in multilingual LLMs with lightweight learned classifiers that route each instance to either native or translation-based prompting. Their analysis across 10 languages and 4 benchmarks shows no single strategy is universally optimal, with translation benefiting low-resource languages most, and the learned routing achieving statistically significant improvements over fixed strategies.
MeasHalu is a novel framework for mitigating scientific measurement hallucinations in LLMs through a two-stage reasoning-aware fine-tuning strategy and progressive reward curriculum. It introduces a fine-grained taxonomy of measurement-specific hallucinations and demonstrates improved accuracy on the MeasEval benchmark.
Researchers from Tianjin University and Alibaba Group propose EA-RLVR, a reinforcement learning framework with verifiable rewards that improves cross-cultural entity translation in LLMs by activating parametric knowledge already encoded during pre-training, without relying on external knowledge bases. Training on 7k samples boosts Qwen3-14B's entity translation accuracy from 23.66% to 31.87% on unseen entities.
Researchers from the University of British Columbia propose an unsupervised graph-based system for organizing arguments from online debates by constructing interaction graphs and applying community detection to reveal diverse viewpoint distributions. The approach requires no training data and aims to help users navigate complex argumentative landscapes and combat filter bubbles.
This paper investigates how informal text (slang, emoji, Gen-Z filler tokens) degrades NLI accuracy in ELECTRA-small and RoBERTa-large models, identifying two distinct failure mechanisms—tokenization failure (emoji mapped to [UNK]) and distribution shift (out-of-domain noise tokens)—and proposes targeted mitigations that recover accuracy without harming clean-text performance.
This position paper argues that audio misinformation on platforms like podcasts and WhatsApp voice notes is structurally different from text-based misinformation, carrying unique persuasive properties through prosody and conversational dynamics that existing fact-checking pipelines fail to address. The authors call for a rethinking of verification pipelines tailored to the spoken and conversational nature of audio media.
Researchers from Arizona State University present a framework for evaluating adaptive personalization of educational reading materials using theory-grounded simulated learners, incorporating memory models, misconception revision, and Bayesian Knowledge Tracing. Experiments across three subjects show adaptive reading significantly improved outcomes in computer science but had mixed results in chemistry and biology.
This paper presents a hybrid framework for detecting alarming or distressed student verbal responses by combining a text classifier (content-based) and an audio classifier (prosodic features), aimed at expediting human review in Automated Verbal Response Scoring systems. The approach addresses a safety gap in automated scoring pipelines where at-risk student responses may otherwise go unnoticed.
Researchers from University of Utah and CMU propose FragMend, an interpretability-based approach for vocabulary expansion in LLMs that addresses token over-fragmentation in non-Latin script languages. Their method outperforms frequency-based vocabulary selection and baseline embedding initialization by ~20 points for several underrepresented languages.