Tag
Proposes OPI, an ontology-guided framework for multi-hop knowledge graph question answering that leverages a relation-centric ontology graph for bidirectional retrieval and iterative refinement, achieving state-of-the-art results on multiple benchmarks.
This paper introduces Complementary Action Modeling (CAM), a task that identifies or generates procedural counterparts of automotive maintenance instructions by modifying the action phrase while preserving context. Using a German automotive dataset, the authors examine candidate matching and controlled Seq2Seq generation to model these complementary instructions.
Proposes a term-centric framework for inducing hierarchical taxonomies from heterogeneous text sources, enabling cross-source alignment and interpretable hierarchies. Experiments on a multi-source benchmark demonstrate improved coherence and quality over text- and summary-based baselines.
This paper presents a systematic analysis of evaluation pitfalls in multimedia event extraction, identifying issues such as inconsistent data processing, inconsistent task assumptions, and overly relaxed evaluation settings that can lead to overestimated performance.
This paper proposes a framework for fallacy classification that uses LLMs to extract patterns from fallacious examples and their explanations, achieving statistically significant improvements over zero-shot baselines and demonstrating cross-dataset generalization.
This paper proposes KIRP, a zero-shot stance detection framework for tweets that integrates external knowledge with entity reorganization and reflective chain-of-thought reasoning, achieving state-of-the-art performance on multiple datasets including a newly constructed Japanese tweet dataset.
This paper investigates whether EEG signals can complement eye-tracking signals for automatic keyphrase extraction from microblogs. Using the ZuCo corpus, the authors show that cognitive signals, especially EEG, improve AKE performance across different models.
Presents Tatoxa, a state-of-the-art system for text detoxification in the Tatar language, outperforming existing LLMs. Introduces a new dataset and shows that cross-lingual transfer performs worse than native data.
This paper proposes SARA, a framework that aligns routing distributions of multilingual inputs using Jensen-Shannon divergence to improve expert sharing for low-resource languages in sparse Mixture-of-Experts models. Experiments on Qwen3-30B-A3B and Phi-3.5-MoE-instruct show improvements on multilingual benchmarks.
This paper investigates prompt-based learning for automatically generating highlights of academic papers, using models like GPT-2, T5, and ChatGPT, and shows that ChatGPT with few-shot prompts achieves performance comparable to or better than supervised methods without requiring task-specific training data.
This paper develops a codebook for self-stigma among people who use drugs and analyzes 72,115 Reddit posts to examine prevalence, co-occurrence, and temporal patterns of cognitive, affective, and behavioral stigma indicators, finding that self-stigma is expressed as an integrated phenomenon with behavioral indicators often preceding core indicators.
This paper proposes a resource-light algorithm to automatically assign part-of-speech tags to senses in the Al-Mawrid Arabic-English bilingual dictionary by transferring tags from English WordNet after disambiguation, achieving high accuracy with minimal cost.
T2D-Bench is a benchmark for evaluating LLM outputs for Type 2 Diabetes using a multi-layer clinical-lifestyle knowledge graph. It reveals that current LLMs fail evidence-path checks in about a third of cases.
This paper constructs large-scale algorithm co-occurrence networks from the full text of academic papers to study the collective influence of algorithms in NLP, finding that classic, high-performing, and intersectional algorithms hold central network positions.
This paper introduces RASC+, a retrieval-constrained LLM adjudication method for clinical value set authoring that improves candidate-pool recall and selection precision over prior RASC baselines, demonstrating that blinded LLM adjudication with Qwen3-based retrieval significantly outperforms direct generation.
This paper presents a scalable framework using LLMs for implicit sentiment analysis of product desirability from qualitative feedback, achieving up to 0.97 Pearson correlation and 94% accuracy while providing explanations, with GPT-4o-mini offering similar performance at 94% lower cost.
A systematic experimental analysis evaluating eight state-of-the-art Diffusion Language Models across multiple benchmarks, analyzing trade-offs between generation quality and computational efficiency.
The article discusses why AI systems have difficulty interpreting uncertainty and ambiguity in human conversation, highlighting ongoing challenges in natural language understanding.
The Jan 6, 2026 draft of the 3rd edition of 'Speech and Language Processing' by Dan Jurafsky and James H. Martin is released, featuring a revised structure with a focus on large language models and updated chapters.
This paper introduces Approximate Structured Diffusion, a method that combines conditional random fields (CRFs) with discrete diffusion for sequence labelling. It uses a CRF conditioned on noisy label sequences and approximate mean-field inference, achieving a 16.5% error reduction on POS tagging.