CAIT: A Syntactic Parsing Toolkit for Child-Adult InTeractions

arXiv cs.CL 05/20/26, 04:00 AM Tools

Summary

CAIT is an open-source toolkit for syntactic parsing of child-adult interactions, featuring a dependency parser, POS tagger, and construction tagger trained on the UD-English-CHILDES treebank, outperforming general English parsers like SpaCy and Stanza.

arXiv:2605.19718v1 Announce Type: new Abstract: CHILDES is a paramount resource for language acquisition studies -- yet computational tools for analyzing its syntactic structure remain limited. Leveraging the recent release of the UD-English-CHILDES treebank with gold-standard Universal Dependencies (UD) annotations, we train a state-of-the-art dependency parser specifically tailored to CHILDES. The parser more accurately captures syntactic patterns in child--adult interactions, outperforming widely used off-the-shelf English parsers, including SpaCy and Stanza. Alongside the parser, we also release a Part-of-Speech tagger and an utterance-level construction tagger, which together form the open-source Syntactic Parsing Toolkit for Child--Adult InTeractions (CAIT). Through a detailed error analysis and a case study tracking the distribution of syntactic constructions across developmental time in CHILDES, we demonstrate the practical utility of the toolkit for large-scale, reproducible research on language acquisition.

Original Article

View Cached Full Text

Cached at: 05/20/26, 08:26 AM

# CAIT: A Syntactic Parsing Toolkit for Child-Adult InTeractions
Source: [https://arxiv.org/abs/2605.19718](https://arxiv.org/abs/2605.19718)
[View PDF](https://arxiv.org/pdf/2605.19718)

> Abstract:CHILDES is a paramount resource for language acquisition studies \-\- yet computational tools for analyzing its syntactic structure remain limited\. Leveraging the recent release of the UD\-English\-CHILDES treebank with gold\-standard Universal Dependencies \(UD\) annotations, we train a state\-of\-the\-art dependency parser specifically tailored to CHILDES\. The parser more accurately captures syntactic patterns in child\-\-adult interactions, outperforming widely used off\-the\-shelf English parsers, including SpaCy and Stanza\. Alongside the parser, we also release a Part\-of\-Speech tagger and an utterance\-level construction tagger, which together form the open\-source Syntactic Parsing Toolkit for Child\-\-Adult InTeractions \(CAIT\)\. Through a detailed error analysis and a case study tracking the distribution of syntactic constructions across developmental time in CHILDES, we demonstrate the practical utility of the toolkit for large\-scale, reproducible research on language acquisition\.

## Submission history

From: Francesca Padovani \[[view email](https://arxiv.org/show-email/56363ed9/2605.19718)\] **\[v1\]**Tue, 19 May 2026 11:53:08 UTC \(1,046 KB\)

CAIT: A Syntactic Parsing Toolkit for Child-Adult InTeractions

Similar Articles

ACAT: A Collaborative Platform for Efficient Aspect-Based Sentiment Dataset Annotation

Announcing BABLR

Show HN: Anyone interested in a tool helps to explore C++ ASTs

DraDDP: A Multimodal Multi-Party Dialogue Discourse Parsing Dataset

COTCAgent: Preventive Consultation via Probabilistic Chain-of-Thought Completion

Submit Feedback

Similar Articles

ACAT: A Collaborative Platform for Efficient Aspect-Based Sentiment Dataset Annotation

Show HN: Anyone interested in a tool helps to explore C++ ASTs

DraDDP: A Multimodal Multi-Party Dialogue Discourse Parsing Dataset

COTCAgent: Preventive Consultation via Probabilistic Chain-of-Thought Completion