annotation

#annotation

Introducing corpora Hlava Cor and Hlava AD: Human Label Variation in Coreference and Discourse Relations

arXiv cs.CL ↗ · 2026-06-25 Cached

This paper introduces two new Czech corpora, Hlava Cor and Hlava AD, designed to study human label variation in coreference and discourse relations. The corpora feature multiple annotations and annotator explanations, achieving 60-65% inter-annotator agreement and revealing systematic differences in interpretation.

0 favorites 0 likes

#annotation

Prague Dependency Treebank -- Consolidated 2.0: Enriching a Complex Annotation Scheme

arXiv cs.CL ↗ · 2026-06-24 Cached

We present the second consolidated version of the Prague Dependency Treebank, a 4-million-token manual multilingual annotation resource covering morphology, syntax, semantics, coreference, and discourse, along with compatible lexicons.

0 favorites 0 likes

#annotation

Learning Perspectivist Social Meaning via Demographic-Conditioned Fusion Embeddings

arXiv cs.CL ↗ · 2026-06-08 Cached

This paper proposes demographic-conditioned fusion embeddings to model perspectivist social meaning in language, showing consistent improvements over text-only baselines by integrating annotator demographics into NLP systems.

0 favorites 0 likes

#annotation

@vintcessun: Do you really know who the annotators are in the NLP papers you read? An audit of ACL papers from 2018-2025 reveals that key details such as annotator training, language proficiency, and compensation are often missing, especially in model evaluation studies. This directly threatens research reproducibility and reliability. This paper proposes a unified taxonomy + LLM-assisted automatic extraction pipeline, evaluated on 2,667 annotation tasks…

X AI KOLs Timeline ↗ · 2026-06-08 Cached

A large-scale audit of ACL papers from 2018-2025 reveals that key annotation details (training, language proficiency, compensation, etc.) are often missing, threatening reproducibility. The authors propose a unified taxonomy and an LLM-assisted extraction pipeline evaluated on 2,667 annotation tasks.

0 favorites 0 likes

#annotation

ReasoningFlow: Discourse Structures for Understanding LLM Reasoning Traces

arXiv cs.CL ↗ · 2026-06-05 Cached

Introduces ReasoningFlow, a framework to capture discourse structures of large language model reasoning traces as directed acyclic graphs, enabling fine-grained analysis of reasoning behaviors like self-reflection and backtracking. Based on manual and automatic annotation of thousands of traces, it reveals structural similarities across models and that most erroneous steps do not contribute to final answers.

0 favorites 0 likes

#annotation

On the Limits of LLM Adaptability: Impact of Model-Internalized Priors on Annotation Task Performance

arXiv cs.CL ↗ · 2026-06-02 Cached

This paper investigates how LLMs' internal priors affect zero-shot annotation performance, finding that nearly two-thirds of errors resist prompt-based correction and introducing Definition-Specific Familiarity as a better predictor than memorization metrics.

0 favorites 0 likes

#annotation

Refining Word-Based Grammatical Error Annotation for L2 Korean

arXiv cs.CL ↗ · 2026-06-01 Cached

This paper refines word-based grammatical error annotation for L2 Korean by addressing problems in existing resources, including surface target realization and single-reference evaluation, and demonstrates improvements using KoBART-based correction.

0 favorites 0 likes

#annotation

Annotator Positionality as Signal: Psychometric Weighting for Anti-Autistic Ableism Detection

arXiv cs.CL ↗ · 2026-05-27 Cached

This paper introduces a bias-aware evaluation framework for detecting anti-autistic ableist language in LLMs, using psychometrically-weighted ground truth based on annotator positionality. It finds that LLMs frequently misclassify community-reclaimed language as ableist and rely on surface-level keyword matching rather than context.

0 favorites 0 likes

#annotation

AraHopeCorpus: Annotation Guidelines and Dataset for Hope Speech in Arabic Social Media Crisis Discourse

arXiv cs.CL ↗ · 2026-05-25 Cached

This paper introduces AraHopeCorpus, the first annotated dataset of hope speech in Arabic social media, collected from YouTube comments about the war on Gaza. It provides a detailed annotation framework and analysis, showing that hopeful language dominates crisis discourse.

0 favorites 0 likes

#annotation

Refining and Reusing Annotation Guidelines for LLM Annotation

arXiv cs.CL ↗ · 2026-05-21 Cached

This paper proposes an iterative moderation framework that refines and reuses annotation guidelines to improve LLM-based annotation performance, validated on biomedical NER tasks with GPT, Gemini, and DeepSeek models.

0 favorites 0 likes

#annotation

Annotate-in-Place Notes with Emacs and org-remark

Lobsters Hottest ↗ · 2026-05-20 Cached

A blog post describing org-remark, an Emacs package for annotating files in-place, addressing the decoupling problem in digital note-taking by keeping notes attached to their source.

0 favorites 0 likes

#annotation

DiscoExplorer: An Open Interface for the Study of Multilingual Discourse Relations

arXiv cs.CL ↗ · 2026-05-18 Cached

Presents DiscoExplorer, an open source web interface for searching and visualizing discourse relation datasets across 16 languages, making DISRPT shared task data publicly accessible.

0 favorites 0 likes

#annotation

roboflow/supervision

GitHub Trending (daily) ↗ · 2026-05-14 Cached

roboflow/supervision is an open-source Python toolkit for computer vision that provides reusable building blocks for data loading, annotation, and real-time processing, with model-agnostic support for popular libraries.

0 favorites 0 likes

annotation

Submit Feedback