CAIT: A Syntactic Parsing Toolkit for Child-Adult InTeractions

arXiv cs.CL Tools

Summary

CAIT is an open-source toolkit for syntactic parsing of child-adult interactions, featuring a dependency parser, POS tagger, and construction tagger trained on the UD-English-CHILDES treebank, outperforming general English parsers like SpaCy and Stanza.

arXiv:2605.19718v1 Announce Type: new Abstract: CHILDES is a paramount resource for language acquisition studies -- yet computational tools for analyzing its syntactic structure remain limited. Leveraging the recent release of the UD-English-CHILDES treebank with gold-standard Universal Dependencies (UD) annotations, we train a state-of-the-art dependency parser specifically tailored to CHILDES. The parser more accurately captures syntactic patterns in child--adult interactions, outperforming widely used off-the-shelf English parsers, including SpaCy and Stanza. Alongside the parser, we also release a Part-of-Speech tagger and an utterance-level construction tagger, which together form the open-source Syntactic Parsing Toolkit for Child--Adult InTeractions (CAIT). Through a detailed error analysis and a case study tracking the distribution of syntactic constructions across developmental time in CHILDES, we demonstrate the practical utility of the toolkit for large-scale, reproducible research on language acquisition.
Original Article
View Cached Full Text

Cached at: 05/20/26, 08:26 AM

# CAIT: A Syntactic Parsing Toolkit for Child-Adult InTeractions
Source: [https://arxiv.org/abs/2605.19718](https://arxiv.org/abs/2605.19718)
[View PDF](https://arxiv.org/pdf/2605.19718)

> Abstract:CHILDES is a paramount resource for language acquisition studies \-\- yet computational tools for analyzing its syntactic structure remain limited\. Leveraging the recent release of the UD\-English\-CHILDES treebank with gold\-standard Universal Dependencies \(UD\) annotations, we train a state\-of\-the\-art dependency parser specifically tailored to CHILDES\. The parser more accurately captures syntactic patterns in child\-\-adult interactions, outperforming widely used off\-the\-shelf English parsers, including SpaCy and Stanza\. Alongside the parser, we also release a Part\-of\-Speech tagger and an utterance\-level construction tagger, which together form the open\-source Syntactic Parsing Toolkit for Child\-\-Adult InTeractions \(CAIT\)\. Through a detailed error analysis and a case study tracking the distribution of syntactic constructions across developmental time in CHILDES, we demonstrate the practical utility of the toolkit for large\-scale, reproducible research on language acquisition\.

## Submission history

From: Francesca Padovani \[[view email](https://arxiv.org/show-email/56363ed9/2605.19718)\] **\[v1\]**Tue, 19 May 2026 11:53:08 UTC \(1,046 KB\)

Similar Articles

ACAT: A Collaborative Platform for Efficient Aspect-Based Sentiment Dataset Annotation

arXiv cs.CL

ACAT is a web-based collaborative annotation platform supporting four Aspect-Based Sentiment Analysis (ABSA) workflows, featuring an automated ETL pipeline that computes Inter-Annotator Agreement metrics at export to produce training-ready datasets. Validated on 1,002 restaurant reviews, it achieves a median annotation time of 31.58 seconds and raw IAA up to 0.86.

Announcing BABLR

Lobsters Hottest

Announcing BABLR, a new generalized parser framework and API-based platform for software development that aims to shift the IDE paradigm from text file editing to code document editing. It includes a parser framework competing with Tree-sitter, a parse tree format agAST competing with ESTree, and a new data language CSTML.

DraDDP: A Multimodal Multi-Party Dialogue Discourse Parsing Dataset

arXiv cs.CL

This paper introduces DraDDP, the first publicly available English multimodal dataset for multi-party dialogue discourse parsing, built from American TV dramas with 495 segments, 6,374 utterances, and 9.1 hours of video. Benchmarks show multimodal information improves parsing of dialogue structures and relation types.