Tag
We present the second consolidated version of the Prague Dependency Treebank, a 4-million-token manual multilingual annotation resource covering morphology, syntax, semantics, coreference, and discourse, along with compatible lexicons.
This paper introduces AthDGC, the first openly licensed dependency-parsed treebank of Greek spanning eight diachronic periods, with verse-level cross-alignment to four ancient Indo-European languages using NLP tools like Stanza, LaBSE, and multilingual-BERT.
AfriSUD is a new dependency treebank collection for African languages, following the Surface-Syntactic Universal Dependencies (SUD) framework, designed to evaluate NLP models on languages like Naija, Wolof, and Yorùbá.
This paper presents a reproducible pipeline for building Universal Dependencies-style parsing resources for Katharevousa Greek parliamentary text, including OCR reconstruction, LLM-assisted annotation, and evaluation of multiple parsers. The best model (XLM-R) achieves 0.8893 UPOS accuracy and 0.5162 LAS, significantly outperforming off-the-shelf baselines.