Tag
This paper introduces two new Czech corpora, Hlava Cor and Hlava AD, designed to study human label variation in coreference and discourse relations. The corpora feature multiple annotations and annotator explanations, achieving 60-65% inter-annotator agreement and revealing systematic differences in interpretation.
This paper presents MorfFlex, a morphological dictionary architecture for languages with rich inflection and derivation, exemplified by MorfFlex CZ for Czech, which contains over 100 million wordforms and supports annotation consistency and NLP tools.
This paper introduces UD_Czech-PDTC, a large and genre-diverse treebank for Czech in the Universal Dependencies framework, derived from the Prague Dependency Treebank-Consolidated. It describes the conversion process and differences between annotation schemes.
We present the second consolidated version of the Prague Dependency Treebank, a 4-million-token manual multilingual annotation resource covering morphology, syntax, semantics, coreference, and discourse, along with compatible lexicons.