Tag
A study analyzing 142K NLP papers from 2010–2026 finds that both established and new NLP authors are increasingly publishing in general ML venues like NeurIPS and ICLR rather than core NLP conferences like ACL, with a significant citation premium favoring ML venues.
BamiBERT is a new BERT-based pre-trained language model for Vietnamese that addresses limitations of PhoBERT, supporting longer context and operating without word segmentation, achieving state-of-the-art results on multiple Vietnamese benchmarks.
Introduces RusFinChain, the first Russian-language symbolic benchmark for verifiable chain-of-thought reasoning in finance, spanning 17 domains with 5,280 parameterized examples and enhanced evaluation metrics including fuzzy numeric alignment.
Svarna is an open-source web-based corpus workbench for Modern Greek, integrating multiple databases with over 507 million words and providing various linguistic analysis tools, released under MIT license.
This paper compares top-down and bottom-up approaches for collecting text-based data about disasters from news articles, using German news about landslides as a case study.
This paper proposes an emotion analysis interface using Natural Semantic Metalanguage (NSM) to generate faithful, interpretable explanations for emotion classifications, trading slight accuracy for verifiability.
Introduces a comprehensive hate speech dataset for Turkish and Arabic, and develops state-of-the-art BERT-based models for hate speech analysis including classification, intensity prediction, target identification, and span detection.
LabGuard introduces a framework that translates natural-language laboratory safety rules into executable runtime monitors for embodied agents, achieving a reduction in unsafe events from 39.5% to 23.8% while maintaining task success.
This paper presents a benchmark and evaluation protocol for faithful natural-language-to-Lean statement formalization, revealing a 29-point gap between compile-pass and consensus-faithfulness, and decomposing the effects of expert drafting, context search, and elaboration feedback.
This study examines how team institutional composition (academic, industrial, or mixed) affects the novelty of academic papers in NLP, using fine-grained knowledge entities like methods and datasets to measure novelty.
This paper investigates linguistic distancing as an indicator of emotion regulation across age groups using social media text, finding that linguistic distancing increases with age, consistent with improved well-being in older adults.
This paper introduces a Bangla event detection benchmark with noisy text (ASR, orthographic corruption) and evaluates encoder-only and decoder-only LLMs, finding decoder models more robust to noise.
This paper introduces a hybrid framework for sentence-level emotion annotation of song lyrics that optimizes human and LLM collaboration by predicting misalignment, addressing subjectivity and scalability challenges in lyric emotion recognition.
This paper introduces BERTomelo, a next-generation monolingual encoder pre-trained for Portuguese using the ModernBERT architecture, achieving superior performance on downstream tasks like STS and NER compared to previous Portuguese and multilingual models.
Introduces SEATauBench, the first agent-focused evaluation framework for Southeast Asian languages, adapting τ²-Bench to Mandarin, Vietnamese, Thai, Indonesian, and Filipino, and reveals a significant capability gap when moving from English to localized settings.
Proposes OPI, an ontology-guided framework for multi-hop knowledge graph question answering that leverages a relation-centric ontology graph for bidirectional retrieval and iterative refinement, achieving state-of-the-art results on multiple benchmarks.
This paper introduces Complementary Action Modeling (CAM), a task that identifies or generates procedural counterparts of automotive maintenance instructions by modifying the action phrase while preserving context. Using a German automotive dataset, the authors examine candidate matching and controlled Seq2Seq generation to model these complementary instructions.
Proposes a term-centric framework for inducing hierarchical taxonomies from heterogeneous text sources, enabling cross-source alignment and interpretable hierarchies. Experiments on a multi-source benchmark demonstrate improved coherence and quality over text- and summary-based baselines.
This paper presents a systematic analysis of evaluation pitfalls in multimedia event extraction, identifying issues such as inconsistent data processing, inconsistent task assumptions, and overly relaxed evaluation settings that can lead to overestimated performance.
This paper proposes a framework for fallacy classification that uses LLMs to extract patterns from fallacious examples and their explanations, achieving statistically significant improvements over zero-shot baselines and demonstrating cross-dataset generalization.