CAIT: A Syntactic Parsing Toolkit for Child-Adult InTeractions
Summary
CAIT is an open-source toolkit for syntactic parsing of child-adult interactions, featuring a dependency parser, POS tagger, and construction tagger trained on the UD-English-CHILDES treebank, outperforming general English parsers like SpaCy and Stanza.
View Cached Full Text
Cached at: 05/20/26, 08:26 AM
# CAIT: A Syntactic Parsing Toolkit for Child-Adult InTeractions Source: [https://arxiv.org/abs/2605.19718](https://arxiv.org/abs/2605.19718) [View PDF](https://arxiv.org/pdf/2605.19718) > Abstract:CHILDES is a paramount resource for language acquisition studies \-\- yet computational tools for analyzing its syntactic structure remain limited\. Leveraging the recent release of the UD\-English\-CHILDES treebank with gold\-standard Universal Dependencies \(UD\) annotations, we train a state\-of\-the\-art dependency parser specifically tailored to CHILDES\. The parser more accurately captures syntactic patterns in child\-\-adult interactions, outperforming widely used off\-the\-shelf English parsers, including SpaCy and Stanza\. Alongside the parser, we also release a Part\-of\-Speech tagger and an utterance\-level construction tagger, which together form the open\-source Syntactic Parsing Toolkit for Child\-\-Adult InTeractions \(CAIT\)\. Through a detailed error analysis and a case study tracking the distribution of syntactic constructions across developmental time in CHILDES, we demonstrate the practical utility of the toolkit for large\-scale, reproducible research on language acquisition\. ## Submission history From: Francesca Padovani \[[view email](https://arxiv.org/show-email/56363ed9/2605.19718)\] **\[v1\]**Tue, 19 May 2026 11:53:08 UTC \(1,046 KB\)
Similar Articles
ACAT: A Collaborative Platform for Efficient Aspect-Based Sentiment Dataset Annotation
ACAT is a web-based collaborative annotation platform supporting four Aspect-Based Sentiment Analysis (ABSA) workflows, featuring an automated ETL pipeline that computes Inter-Annotator Agreement metrics at export to produce training-ready datasets. Validated on 1,002 restaurant reviews, it achieves a median annotation time of 31.58 seconds and raw IAA up to 0.86.
Announcing BABLR
Announcing BABLR, a new generalized parser framework and API-based platform for software development that aims to shift the IDE paradigm from text file editing to code document editing. It includes a parser framework competing with Tree-sitter, a parse tree format agAST competing with ESTree, and a new data language CSTML.
Show HN: Anyone interested in a tool helps to explore C++ ASTs
ACAV is an interactive Abstract Syntax Tree visualization tool for C, C++, and Objective-C, built with Clang and Qt, that allows developers to explore ASTs from real codebases using compilation databases.
DraDDP: A Multimodal Multi-Party Dialogue Discourse Parsing Dataset
This paper introduces DraDDP, the first publicly available English multimodal dataset for multi-party dialogue discourse parsing, built from American TV dramas with 495 segments, 6,374 utterances, and 9.1 hours of video. Benchmarks show multimodal information improves parsing of dialogue structures and relation types.
COTCAgent: Preventive Consultation via Probabilistic Chain-of-Thought Completion
COTCAgent is a hierarchical reasoning framework for longitudinal electronic health records that uses a probabilistic chain-of-thought completion approach, achieving 90.47% Top-1 accuracy on a self-built dataset and outperforming existing medical agents.