Tag
This paper introduces two new Czech corpora, Hlava Cor and Hlava AD, designed to study human label variation in coreference and discourse relations. The corpora feature multiple annotations and annotator explanations, achieving 60-65% inter-annotator agreement and revealing systematic differences in interpretation.
We present the second consolidated version of the Prague Dependency Treebank, a 4-million-token manual multilingual annotation resource covering morphology, syntax, semantics, coreference, and discourse, along with compatible lexicons.
This paper proposes demographic-conditioned fusion embeddings to model perspectivist social meaning in language, showing consistent improvements over text-only baselines by integrating annotator demographics into NLP systems.
A large-scale audit of ACL papers from 2018-2025 reveals that key annotation details (training, language proficiency, compensation, etc.) are often missing, threatening reproducibility. The authors propose a unified taxonomy and an LLM-assisted extraction pipeline evaluated on 2,667 annotation tasks.
Introduces ReasoningFlow, a framework to capture discourse structures of large language model reasoning traces as directed acyclic graphs, enabling fine-grained analysis of reasoning behaviors like self-reflection and backtracking. Based on manual and automatic annotation of thousands of traces, it reveals structural similarities across models and that most erroneous steps do not contribute to final answers.
This paper investigates how LLMs' internal priors affect zero-shot annotation performance, finding that nearly two-thirds of errors resist prompt-based correction and introducing Definition-Specific Familiarity as a better predictor than memorization metrics.
This paper refines word-based grammatical error annotation for L2 Korean by addressing problems in existing resources, including surface target realization and single-reference evaluation, and demonstrates improvements using KoBART-based correction.
This paper introduces a bias-aware evaluation framework for detecting anti-autistic ableist language in LLMs, using psychometrically-weighted ground truth based on annotator positionality. It finds that LLMs frequently misclassify community-reclaimed language as ableist and rely on surface-level keyword matching rather than context.
This paper introduces AraHopeCorpus, the first annotated dataset of hope speech in Arabic social media, collected from YouTube comments about the war on Gaza. It provides a detailed annotation framework and analysis, showing that hopeful language dominates crisis discourse.
This paper proposes an iterative moderation framework that refines and reuses annotation guidelines to improve LLM-based annotation performance, validated on biomedical NER tasks with GPT, Gemini, and DeepSeek models.
A blog post describing org-remark, an Emacs package for annotating files in-place, addressing the decoupling problem in digital note-taking by keeping notes attached to their source.
Presents DiscoExplorer, an open source web interface for searching and visualizing discourse relation datasets across 16 languages, making DISRPT shared task data publicly accessible.
roboflow/supervision is an open-source Python toolkit for computer vision that provides reusable building blocks for data loading, annotation, and real-time processing, with model-agnostic support for popular libraries.