The Grammar Does the Work: Functional vs. Lexical Dependency Length Minimization Across Universal Dependencies
Summary
This paper analyzes 122 languages to show that dependency length minimization operates differently for functional dependencies (short and invariant) versus lexical dependencies (longer and variable), suggesting that grammar provides local scaffolding for processing.
View Cached Full Text
Cached at: 07/03/26, 05:41 AM
# The Grammar Does the Work: Functional vs. Lexical Dependency Length Minimization Across Universal Dependencies Source: [https://arxiv.org/abs/2607.01899](https://arxiv.org/abs/2607.01899) [View PDF](https://arxiv.org/pdf/2607.01899) > Abstract:Dependency length minimization \(DLM\) is a well\-documented processing universal, but previous studies report a single mean dependency distance \(MDD\) per language, obscuring variation across syntactic relation types\. We analyze 122 languages in UD and SUD \(version 2\.17\), showing that DLM operates on two distinct levels\. Grammar\-driven optimization targets functional dependencies \(det, case, aux\), which are universally short \(mean 1\.71, $\\sigma$ = 0\.33\) and invariant across typologically diverse languages\. Processing\-driven optimization operates on lexical dependencies \(nsubj, obj, obl\), which are longer \(mean 2\.87\), highly variable \($\\sigma$ = 0\.63\), and constrained by word\-order typology\. This asymmetry holds in SUD despite reversed head direction \(r = 0\.92\)\. We conclude that ''the grammar does the work'' of minimization by scaffolding sentences with local functional attachments, leaving processing pressures to determine the ordering of lexical heads\. ## Submission history From: kim gerdes \[[view email](https://arxiv.org/show-email/cb10fed4/2607.01899)\] \[via CCSD proxy\] **\[v1\]**Thu, 2 Jul 2026 08:55:07 UTC \(1,602 KB\)
Similar Articles
Probing Minimalist Phase Structure in LLMs: What Universal Dependencies Cannot Represent
This paper investigates whether large language models encode syntactic abstractions like phase boundaries that are not captured by Universal Dependencies, using structural probes on wh-movement stimuli with invariant UD distances, finding evidence across 13 LLMs for phase-structure representations that are causally active.
Syntax as a Rosetta Stone: Universal Dependencies for In-Context Coptic Translation
Georgetown researchers boost low-resource Coptic-to-English translation by augmenting in-context prompts with Universal Dependencies syntactic parses alongside bilingual glosses, setting a new state-of-the-art.
On the Persistent Effects of Lexicality in Large Language Mod
This paper investigates how lexical overlap, rather than semantic content, influences LLM representations across layers and architectures, and demonstrates that this lexical effect persists even in models trained for semantic similarity, leading to degraded performance on downstream tasks.
A Computational Operationalisation of Competing Maturational Theories of Syntactic Development via Statistical Grammar Induction
This paper presents a computational framework to test competing maturational theories of syntactic development in children, specifically comparing bottom-up versus inward accounts using statistical grammar induction.
Measuring language complexity from hierarchical reuse of recurring patterns
Introduces the ladderpath index as a measure of language complexity based on algorithmic information theory, applied to 21 parallel corpora. The index is approximately invariant across languages, supporting the equi-complexity hypothesis, and reveals trade-offs between character inventory and corpus length.