AfriSUD: A Dependency Treebank Collection for Evaluating Models on African Languages

arXiv cs.CL 06/12/26, 04:00 AM Papers
nlp african-languages dependency-parsing treebank syntactic-annotation evaluation language-models
Summary
AfriSUD is a new dependency treebank collection for African languages, following the Surface-Syntactic Universal Dependencies (SUD) framework, designed to evaluate NLP models on languages like Naija, Wolof, and Yorùbá.
arXiv:2606.12708v1 Announce Type: new Abstract: Despite their linguistic diversity and global significance, African languages remain underrepresented in research and resources to support NLP. We aim to bridge this gap by introducing AfriSUD, the first large-scale collection of syntactically annotated treebanks for nine diverse African languages spanning major language families and regions across Sub-Saharan Africa. Using the Surface-Syntactic Universal Dependencies (SUD) framework, our community-led effort provides high-quality, native-speaker verified data that capture typological key features such as agglutination and tone. We evaluate a range of models on AfriSUD for part-of-speech tagging and dependency parsing including non-transformer baselines, multilingual pretrained encoders, and LLMs. Our results reveal a significant syntax gap, where models still show clear limitations across the nine languages, suggesting that existing architectures may not fully capture the structural diversity of African-language syntax.
Original Article
View Cached Full Text
Cached at: 06/12/26, 08:50 AM
# AfriSUD: A Dependency Treebank Collection for Evaluating Models on African Languages
Source: [https://arxiv.org/html/2606.12708](https://arxiv.org/html/2606.12708)
AfriSUD annotation follows the Surface\-Syntactic Universal Dependencies \(SUD\) framework\(Gerdeset al\.,[2018](https://arxiv.org/html/2606.12708#bib.bib16)\), which represents syntactic relations close to surface structure\. This proves useful for the target languages where auxiliaries and other functional elements often encode tense, aspect, mood, and agreement information\. The annotation pipeline includes lemmatization, Universal part\-of\-speech \(UPOS\) tagging, dependency head annotation, and dependency relation labeling, as illustrated with a Wolof example in Figure[3\.1](https://arxiv.org/html/2606.12708#S3.SS1)\. We use the 17 standard UPOS tags\(Petrovet al\.,[2012](https://arxiv.org/html/2606.12708#bib.bib24)\)and an SUD relation set covering core relations such as subjects, complements, and modifiers, as well as constructions such as auxiliary complements \(comp:aux\), predicative complements \(comp:pred\), and serial verb constructions \(compound:svc\)\. Appendix Table[7](https://arxiv.org/html/2606.12708#A4.T7)provides the full set of POS tags and dependency relations used in our annotation, along with their definitions\.

Annotation was conducted using ArboratorGrew\(Guibonet al\.,[2020](https://arxiv.org/html/2606.12708#bib.bib64)\)and AfriSUD languages with existing treebanks for Naija, Wolof, and Yorùbá\. We used the integrated parser interface in ArboratorGrew\*\*\*https://arborator\.grew\.fr, based on BertForDeprel\(Guiller,[2020](https://arxiv.org/html/2606.12708#bib.bib65)\)to generate initial pre\-annotations from the available data\. The pre\-annotations were reviewed and manually corrected by annotators, while the remaining languages were annotated from scratch\. Across all languages, annotation follows SUD conventions: auxiliaries are treated as syntactic heads, noun class information is encoded as UFeats, serial verb constructions are annotated withcompound:svc, and underspecified relations are marked withudepwhen necessary\.

### 3\.2Quality Control

Each language was annotated by three native\-speaker linguists: a coordinator and two annotators\. All annotators completed training sessions on dependency grammar, the SUD framework, language\-specific guidelines, and a pilot annotation exercise of ten sentences per language\. Since dependency annotation requires interdependent decisions about tokenization, heads, and relation labels, standard Inter\-Annotator Agreement metrics such as Fleiss’s Kappa could not be straightforwardly computed or interpreted\. Following prior African\-language annotation work\(Dioneet al\.,[2023](https://arxiv.org/html/2606.12708#bib.bib9)\)we adopt a consensus\-based adjudication procedure to resolve disagreements\.

The language coordinators supervised the annotation and held regular discussions with the annotators to resolve ambiguous cases and disagreements so as to ensure consistency with the SUD guidelines across languages\. After adjudication, each sentence received a single final annotation agreed upon by the language team\. We then applied automatic validation checks to detect malformed dependency structures, including missing part\-of\-speech or Dependency relation values, absent or multiple roots, root\-label/head mismatches, and cycles in the dependency graph\. Annotators and coordinators were compensated for their work\.†††Each annotator was paid US$750\.

### 3\.3Annotation Challenges

This section analyzes the annotation challenges encountered when applying the SUD formalism to our African languages\. The analysis is structured around three major issues: \(i\) clitics and morphological binding, \(ii\) ambiguity resolution, \(iii\) language\-specific challenges and SUD relations\.

#### 3\.3\.1Clitics and morphological binding

A central difficulty arises from the rich morphological structure where grammatical information is encoded in affixes or clitic\-like elements that are tightly bound to lexical stems\. Agglutination is repeatedly identified as a critical issue in most of the languages\. Two distinct methodologies emerged from the data: morphological decomposition \(considering roots and the attached morphemes separately\) vs\. single\-token preservation\. The Efik, Yorùbá, and Kinyarwanda teams opted to segment orthographic words into syntactic components\. Efik, for instance, is an agglutinating language, reflecting a one\-to\-one correspondence of morpheme to meaning, with these morphemes often arranged in a specific linear order\. For example, the verbemetem\(“You had cooked it”\) is decomposed into the second\-person pronoune\-, the past tense markerme\-, and the verb roottem\. Likewise, in Yoruba, fused prepositional constructions such as ‘sílé’ and ‘níta’ were restored to their base forms ‘síilé’ \(in house\) and ‘níìta’ \(in outside\)\.

Conversely, isiXhosa, Swahili, Hausa, and Igbo maintained lexical integrity\. In isiXhosa, complex verbs containing subject markers \(ndi\-,u\-\) and tense markers \(ya\-\) are treated as single tokens\. The Hausa team treated derived nouns, such asma’aurata,“married couple” \(fromaure, “marry”\), as a single unit rather than splitting off the nominalizing prefixma\-\. This decision aligns with the surface\-oriented approach of SUD\. In practice, morphological information is encoded through features rather than syntactic dependencies, thereby preserving structural consistency while avoiding over\-segmentation\.

#### 3\.3\.2Syntactic and Lexical Ambiguities

Ambiguity is a pervasive issue across the dataset on multiple levels, including lexical, morphological, and syntactic ambiguity\.

The dominant strategy for ambiguity resolution relied heavily on context\-driven analysis and established team\-level annotation conventions\. Annotators consistently favor interpretations that are semantically plausible and structurally coherent\.

For instance, in Yorùbá, the morphemeníis multifunctional \(copula, main verb, or conjunction\)\. Ifnífollows a subject, it is annotated as a verb; if it heads a subordinate clause, it is treated as a subordinating conjunction\. In Kinyarwanda, ambiguity is especially prominent in verb morphology, where a single form may encode multiple grammatical functions\. Words likegukoracan be a verb \(“to do”\) or a noun \(“the act of doing”\), requiring the examination of surrounding tense markers to determine the correct part of speech\. In Efik, ambiguity arises in complex sentence structures involving modifier clauses, cleft sentences, and multiple embedded subjects and objects\. Having recognized the relevant patterns, annotation often required identifying the presence of similar features in each successive sentence and annotating accordingly\. The annotators here relied on shared guidelines to ensure consistency and a uniform annotation pattern across the corpus\.

#### 3\.3\.3Language\-Specific Challenges and SUD Relations

Beyond shared challenges, each language presents unique difficulties that impact the annotation process\. In addition, several SUD relations prove difficult to apply consistently across languages, particularly in contexts involving complex syntax or rich morphology\.

Some of the problems can be described as follows\. Head \(or root\) selection in a language like Yorùbá is not always easy as the language employs multiple auxiliaries \(e\.g\., aspectual markerstiandǹ, or future markersyóò\)\.‡‡‡It should be noted that SUD is based on distributional criteria and treats function words as heads, unlike UD\. For instance, adpositions \(ADP\) are heads of adpositional phrases, auxiliaries \(AUX\) are heads of complex verbal forms, and subordinating conjunctions \(SCONJ\) are heads of subordinated clauses\.The convention among the annotators was to establish a hierarchy for the auxiliaries and to select the one that precedes the others as the rightful head\. In an isolating language like Yorùbá where pluralization is not derived through inflection, the markeràwọnis added before a noun to mark plural\. Annotating this syntactic relationship in SUD was not straightforward\. Annotators decided to use the relationship “compound:prt" which seems to be closely related to the syntactic relationship found between nouns and the plural marker\. Furthermore, problems arose with the underspecifiedudeprelation, which covers both comp and mod in cases where a dependent cannot be clearly classified as an argument or a modifier, or be specified as comp or mod\. Some languages show copula constructions without an overt copula verb like English "to be\." For instance, in IsiXhosa, the absence of an copula constructions led to the treatment of the predicate as the head and the marker as a copula\.

In summary, while the SUD framework provides a useful baseline for cross\-linguistic annotation, its application to morphologically rich and under\-resourced African languages requires careful adaptation\. Common challenges include the treatment of bound morphemes, pervasive ambiguity, and structural mismatches between linguistic phenomena and formal annotation schemes\. Addressing these challenges requires a combination of theoretical flexibility, empirical observation, and collaborative annotation practices\. Annotators must adopt a combination of normalization procedures and language\-specific adaptations of SUD guidelines\. Iterative annotation and validation cycles were frequently used to refine decisions\. In some cases, annotators explicitly acknowledged the need to deviate slightly from standard guidelines to better reflect the linguistic reality of the language\.

## 4Experiments Setup

### 4\.1Baseline Models

##### Stanza

We use Stanza\(Qiet al\.,[2020](https://arxiv.org/html/2606.12708#bib.bib46)\), a neural dependency parsing pipeline, as a strong non\-transformer baseline\. We initialize the parser with pretrained fastText embeddings\(Graveet al\.,[2018](https://arxiv.org/html/2606.12708#bib.bib67)\)when available\. For Runyankore \(nyn\) and Efik\(efi\), which are not covered by the pretrained embeddings, we train new embeddings from publicly available corpora using SALT\(Akeraet al\.,[2022](https://arxiv.org/html/2606.12708#bib.bib53)\)fornynand MT560\(Gowdaet al\.,[2021](https://arxiv.org/html/2606.12708#bib.bib52)\)together with SIB\-200 Ibom\(Kalejaiyeet al\.,[2025](https://arxiv.org/html/2606.12708#bib.bib51)\)forefi\.

##### Multilingual encoders

For the transformer baselines, we fine\-tune an end\-to\-end biaffine dependency parser\(Dozat and Manning,[2016](https://arxiv.org/html/2606.12708#bib.bib41)\)with pretrained encoders: \(1\) general multilingual models mBERT\(Devlinet al\.,[2019](https://arxiv.org/html/2606.12708#bib.bib26)\)and XLM\-RoBERTa Large\(Conneauet al\.,[2020](https://arxiv.org/html/2606.12708#bib.bib27)\)and \(2\) Africa\-centric models AfriBERTa\-large\(Oguejiet al\.,[2021](https://arxiv.org/html/2606.12708#bib.bib22)\), AfroXLMR\-large\(Alabiet al\.,[2022](https://arxiv.org/html/2606.12708#bib.bib29)\), and AfroXLMR\-large\-76L\(Adelaniet al\.,[2024](https://arxiv.org/html/2606.12708#bib.bib66)\)\. In the experiments, each treebank is split into 70/10/20 for train/dev/test partitions\. We report unlabeled attachment score \(UAS\) and labeled attachment score \(LAS\), which measure correct head assignment and correct head\-plus\-label prediction, respectively\. All transformer\-based models are fine\-tuned with HuggingFace Transformers\(Wolfet al\.,[2020](https://arxiv.org/html/2606.12708#bib.bib45)\), using a maximum sequence length of 512, batch size 16, gradient accumulation 2, learning rate5×10−55\\times 10^\{\-5\}, and 50 epochs on a single A100 NVIDIA GPU\.

### 4\.2LLM Prompting

We evaluate widely used LLMs: Gemini\-3\.1\-Pro\(Gemini Team, Google DeepMind,[2026](https://arxiv.org/html/2606.12708#bib.bib68)\), GPT\-5\.2§§§https://developers\.openai\.com/api/docs/models/gpt\-5\.2, GPT\-4o\(Hurstet al\.,[2024](https://arxiv.org/html/2606.12708#bib.bib36)\), and Gemma\-3\-12B\-IT/27B\-IT\(Teamet al\.,[2025](https://arxiv.org/html/2606.12708#bib.bib69)\)\. All models are evaluated with deterministic decoding by setting the temperature toτ=0\\tau=0\. For open\-weight models, we use greedy decoding and set the maximum number of generated tokens to 2,048\. In our experiments, the task is formulated as structured generation: given the raw sentence text and pre\-segmented tokens \(id and surface form\), the model predicts each token’s lemma, UPOS tag, syntactic head, and dependency relation\. A single prompt template is used across all models and languages to ensure fair comparison\. The complete prompt and output schema are provided in Appendix[5](https://arxiv.org/html/2606.12708#A5.F5)\. We perform zero\-shot and few\-shot prompting withK∈\{0,1,5\}K\\in\\\{0,1,5\\\}demonstrations\. Few\-shot examples are sampled from a held\-out pool comprising 10% of each language’s data, which is reserved exclusively for demonstrations and excluded from evaluation\. For both 1\-shot and 5\-shot settings, we use five different demonstration sets sampled with seeds 13–17 and report the mean and standard deviation across runs\.

In addition, we performed supervised fine\-tuning using gemma\-3\-12B for 5 epochs using a learning rate of1×10−51\\times 10^\{\-5\}\. The SFT dataset was obtained by aggregating training samples from all nine AfriSUD languages\. Each example is formatted as a prompt–completion pair, mapping a raw sentence to its gold CoNLL\-U/SUD annotation\. We use a fixed random seed \(42\) to assign an instruction template to each training sentence\. All instruction templates used are provided in Appendix[8](https://arxiv.org/html/2606.12708#A5.T8)\.

### 4\.3Cross\-lingual Transfer

Cross\-lingual transfer depends on several factors, including model choice, transfer strategy, and selection of an appropriate source language\. Previous work on cross\-lingual dependency parsing shows that in the zero\-shot setting, the choice of the source language is important, especially when the source and target languages are typologically distant\(Tran and Bisazza,[2019](https://arxiv.org/html/2606.12708#bib.bib31); Agić,[2017](https://arxiv.org/html/2606.12708#bib.bib32)\)\. Although English is commonly used as the transfer source due to resource availability, evidence from cross\-lingual syntactic transfer indicates that better transfer can often be obtained from sources that are structurally closer to the target language\(Duonget al\.,[2015](https://arxiv.org/html/2606.12708#bib.bib33)\)\. In addition, studies in cross\-lingual syntax indicate that transfer performance depends in part on the relationship between source and target languages, including their typological similarity and broader structural proximity\(Litschkoet al\.,[2020](https://arxiv.org/html/2606.12708#bib.bib34); Fischet al\.,[2019](https://arxiv.org/html/2606.12708#bib.bib35)\)\.

We consider seven source languages for cross\-lingual syntactic transfer: English \(eng\), French \(fra\), Afrikaans \(afr\), Arabic \(ara\), Romanian \(ron\), Naija \(pcm\), and Wolof \(wol\)\. These source languages were selected based on supervised SUD treebank availability and typological diversity, including variation in word order and the relative order of syntactic heads and their dependents, which are known to affect cross\-lingual dependency parsing\(Scholivetet al\.,[2019](https://arxiv.org/html/2606.12708#bib.bib72); Liuet al\.,[2020](https://arxiv.org/html/2606.12708#bib.bib71)\)\. For Wolof and Naija, we further evaluate augmented variants, denoted\+wtband\+nsc, respectively\. In these settings, the source\-language training data are supplemented with pre\-existing SUD treebanks from the SUD release¶¶¶https://surfacesyntacticud\.org/data/:SUD\_Wolof\-WTB@2\.17andSUD\_Naija\-NSC@2\.17\. Romanian is included in light of previous work showing that the choice of source language can substantially affect cross\-lingual transfer performance in dependency parsing\(Agić,[2017](https://arxiv.org/html/2606.12708#bib.bib32); Dioneet al\.,[2023](https://arxiv.org/html/2606.12708#bib.bib9)\)\.

## 5Results

### 5\.1Baseline results

Table 2:UPOS tagging and dependency parsing performance on AfriSUD\. Scores are averaged over five runs; Avg reports the macro\-average across languages with mean per\-language standard deviation\. Bold indicates the best result for each language and metric\.Table[2](https://arxiv.org/html/2606.12708#S5.T2)shows that transformer\-based encoders improve part\-of\-speech tagging, with AfroXLMR\-large\-76Lachieving the best macro\-average accuracy of 90\.6 compared to 88\.4 for Stanza\. However, for dependency parsing, Stanza remains a competitive baseline with the highest average UAS score of 84\.4 and the highest average LAS of 77\.5\. Among encoder\-based models, Africa\-centric models outperform general multilingual encoders overall with AfroXLMR\-large\-76Lreaching 83\.1 UAS and 73\.9 LAS while also achieving the top LAS scores for several languages, including Hausa and Yorùbá\.

Across all models, LAS is consistently lower than UAS, indicating that relation labeling remains more difficult than identifying dependency heads\. Overall, Africa\-centric encoders provide the best transformer\-based results, particularly for part\-of\-speech tagging, but Stanza still achieves the best average dependency parsing performance, including the highest LAS, possibly reflecting the stability of parser\-specific architectures for relation labeling in low\-resource settings\.

### 5\.2LLMs Prompting results

Table 3:Average POS accuracy, UAS, and LAS across AfriSUD languages\.Scores are percentages averaged over five runs, the best result for each metric is shown in bold\.Table[3](https://arxiv.org/html/2606.12708#S5.T3)reports the average scores for UPOS, UAS, and LAS in the nine languages\. In\-context demonstrations consistently improve performance across all models and metrics, with the largest gains observed for LAS\. The gains are particularly clear for models with lower zero\-shot LAS, i\.e\., GPT\-5\.2 improves from 16\.5 to 52\.5 LAS, while Gemma\-3\-12B rises from 2\.9 to 33\.9\. The gains extend beyond LAS, additional demonstrations also improve UAS and UPOS accuracy across models\. The results show a performance gap between closed and open models\. Under 5\-shot setting, Gemini\-3\.1\-Pro performs best overall, reaching 86\.7 UPOS, 73\.0 UAS, and 59\.2 LAS followed by GPT\-5\.2, while GPT\-4o and the open\-weight Gemma models perform lower particularly on LAS\. Across models, the consistent UAS–LAS gap indicates that models identify syntactic heads more reliably than they assign SUD relation labels\.

A similar trend is observed in the language\-level breakdown, Gemini\-3\.1\-Pro achieves the best 5\-shot performance across all languages, with Wolof, Nyankore, Swahili, and Yoruba among the best\-performing languages while Efik and Xhosa remain the most challenging\. Overall, part\-of\-speech tagging achieves higher scores than dependency parsing, while SUD relation labeling remains more challenging\. Detailed results per\-language are provided in the Appendix Table[6](https://arxiv.org/html/2606.12708#A3.T6)\.

##### Few\-shot prompting vs\. supervised fine\-tuning \(SFT\) for dependency parsing:

Although LLMs generally improve with additional shots, a large gap remains between few\-shot prompting and SFT\. Compared with a 5\-shot Gemma\-3\-12B, SFT improves performance by\+14\.3\+14\.3UPOS,\+18\.7\+18\.7UAS,\+24\.1\+24\.1LAS\. Despite being based on a much smaller model, SFT Gemma\-3\-12B reduces the gap to 5\-shot Gemini\-3\.1\-Pro, with a remaining difference of1\.21\.2LAS points\. We report the complete results per\-language in Appendix[6](https://arxiv.org/html/2606.12708#A3.T6)\.

### 5\.3Cross\-lingual Transfer results

![Refer to caption](https://arxiv.org/html/2606.12708v1/x1.png)Figure 2:Zero\-shot cross\-lingual transfer performance of AfroXLMR\-large\-76L on AfriSUD\.Rows denote the source language used for transfer and columns denote the target language\.Figure[2](https://arxiv.org/html/2606.12708#S5.F2)summarizes the 0\-shot transfer with AfroXLMR\-large\-76L across source–target pairs\.pcm\+nscperforms best overall, whileronis the best among non\-African sources\. Among targets,swais consistently easier, whilexhoandefiremain more difficult\. The bar charts for all shot settings are included in Appendix[4](https://arxiv.org/html/2606.12708#A2.F4)\.

## 6Analysis

![Refer to caption](https://arxiv.org/html/2606.12708v1/x2.png)Figure 3:Error breakdown for Gemini\-3\.1 Pro \(0\-shot\) on four SUD\-distinctive relations\. Bars separate fully correct predictions, label\-only errors, and head\-plus\-label errors\.##### Where Do Models Fail?

Although Gemini\-3\.1 Pro achieves the best overall prompting scores, the aggregate metrics do not show which constructions remain difficult\. To better characterize these errors, we aggregated its 0\-shot predictions across all nine African languages in AfriSUD and analyze four SUD\-distinctive relations: serial\-verb constructions \(comp:svc\), possessive constructions \(comp:poss\), underspecified dependencies \(udep\), and Tense\-Aspect\-Mood \(TAM\) auxiliaries \(comp:aux\)\. Figure[3](https://arxiv.org/html/2606.12708#S6.F3)reveals a clear head–label mismatch; Gemini obtains 0% LAS forcomp:svcandcomp:poss, but predicts the correct head \(UAS\) in 46% and 65% of cases respectively\. Forudep, LAS scores remains low despite the 49% UAS of the head\. The model does not producecomp:svcorcomp:possin these cases, but instead assigns broader labels such asmodandcomp:obj\.modaccounts for 45% ofcomp:svcerrors and 14% ofcomp:posserrors, whilecomp:objaccounts for 23% and 42%, respectively\. On the other hand,comp:auxerrors are more structural, head accuracy drops to 39%, and multi\-token TAM chains are often flattened\. For example, Naija auxiliary chains such as*don … dey … fit … go*are predicted as flatcomp:objattachments rather than cascadingcomp:auxdependencies\. Yoruba constructions involving the focus marker*ni*show a similar pattern: the model promotes the main verb torootrather than attaching it to*ni*ascomp:aux\. These errors indicate difficulty with TAM constructions beyond SUD label selection alone\.

Few\-shot prompting reduces some labeling errors:comp:possLAS rises from 0% to 6\.8% with one example and 7\.5% with five, whileudepimproves from 4\.0% to 11\.3% and 18\.2%\. However, structurally complex relations such ascomp:auxremain more challenging\.

## 7Conclusion

We introduced AfriSUD, a large\-scale SUD resource for nine African languages, and provide baseline parsers using Stanza and fine\-tuned multilingual Pretrained Language Models\. The results show that supervised parsers remain strong: Stanza achieves the best overall LAS, while Africa\-centric PLMs such as AfroXLMR are competitive with general multilingual encoders\. LLMs improve with in\-context examples, but their performance remains lower than supervised parsers, especially on LAS\. The persistent UAS–LAS gap shows that LLMs recover syntactic heads more reliably than SUD relation labels\. Overall, AfriSUD provides a foundation for more linguistically grounded evaluation and syntactic modeling of African languages\. Future work should expand language coverage, increase annotation size, and analyze a broader range of constructions and models\.

## Limitations

Our work illustrates some of the significant challenges of the annotation task, and the data cover a limited number of examples for some dependency relations\. Annotation guidelines need to be refined, especially in light of some underspecified relations\. Our LLM experiments are restricted to selected models and prompting settings; closed models may change over time, making the exact reproducibility difficult\. Finally, the error analysis focuses on selected SUD\-particular relations and the best\-performing prompting model\.

## Ethics Statement or Broader Impact

Our work is intended to support syntactic NLP research for African languages through the new AfriSUD treebanks\. Most of the data come from publicly available sources except the Efik data which comes from a closed source and may therefore be excluded from the public release\. We do not anticipate significant privacy risks, since the released materials are based on public texts\. The evaluated LLMs are used only for research purposes, and the resources we release are intended to support reproducible and inclusive NLP research\.

## Use of AI Assistants

We used Claude Code \(Anthropic\) for debugging and developing parts of the experimental codebase\. All scientific claims, experimental design, results, and conclusions were produced and verified by the authors\.

## Acknowledgments

This work was supported by the Princeton Language and Intelligence \(PLI\) Seed Grant Program\. The authors thank the Princeton Center for Digital Humanities for its support in preparing the grant application, and the Princeton Laboratory for Artificial Intelligence for providing compute resources\. We also thank the Masakhane Research Foundation for handling payments to annotators based in different parts of the world\. We are grateful to Joakim Nivre and Dan Zeman for leading the initial workshop\-style session on dependency relations, which helped launch the annotation training process and to Khensa Amani Daoudi for her extensive support in resolving issues with the annotation tool\. Happy Buzaaba is supported by the Program in African Studies and the Africa World Initiative at Princeton\.

## References

- Towards afrocentric NLP for African languages: where we are and where we can go\.InProceedings of the 60th Annual Meeting of the Association for Computational Linguistics \(Volume 1: Long Papers\),Dublin, Ireland,pp\. 3814–3841\.External Links:[Link](https://aclanthology.org/2022.acl-long.265/),[Document](https://dx.doi.org/10.18653/v1/2022.acl-long.265)Cited by:[§3](https://arxiv.org/html/2606.12708#S3.p2.1)\.
- D\. I\. Adelani, J\. Abbott, G\. Neubig, D\. D’souza, J\. Kreutzer, C\. Lignos, C\. Palen\-Michel, H\. Buzaaba, S\. Rijhwani, S\. Ruder, S\. Mayhew, I\. A\. Azime, S\. H\. Muhammad, C\. C\. Emezue, J\. Nakatumba\-Nabende, P\. Ogayo, A\. Anuoluwapo, C\. Gitau, D\. Mbaye, J\. Alabi, S\. M\. Yimam, T\. R\. Gwadabe, I\. Ezeani, R\. A\. Niyongabo, J\. Mukiibi, V\. Otiende, I\. Orife, D\. David, S\. Ngom, T\. Adewumi, P\. Rayson, M\. Adeyemi, G\. Muriuki, E\. Anebi, C\. Chukwuneke, N\. Odu, E\. P\. Wairagala, S\. Oyerinde, C\. Siro, T\. S\. Bateesa, T\. Oloyede, Y\. Wambui, V\. Akinode, D\. Nabagereka, M\. Katusiime, A\. Awokoya, M\. MBOUP, D\. Gebreyohannes, H\. Tilaye, K\. Nwaike, D\. Wolde, A\. Faye, B\. Sibanda, O\. Ahia, B\. F\. P\. Dossou, K\. Ogueji, T\. I\. DIOP, A\. Diallo, A\. Akinfaderin, T\. Marengereke, and S\. Osei \(2021a\)MasakhaNER: named entity recognition for African languages\.Transactions of the Association for Computational Linguistics9,pp\. 1116–1131\.External Links:[Link](https://aclanthology.org/2021.tacl-1.66/),[Document](https://dx.doi.org/10.1162/tacl%5Fa%5F00416)Cited by:[§2](https://arxiv.org/html/2606.12708#S2.p1.1),[§3](https://arxiv.org/html/2606.12708#S3.p1.1)\.
- D\. I\. Adelani, J\. O\. Alabi, A\. Fan, J\. Kreutzer, X\. Shen, M\. Reid, D\. Ruiter, D\. Klakow, P\. Nabende, E\. Chang, T\. Gwadabe, F\. Sackey, B\. F\. P\. Dossou, C\. Emezue, C\. Leong, M\. Beukman, S\. H\. Muhammad, G\. D\. Jarso, O\. Yousuf, A\. N\. Niyongabo Rubungo, G\. Hacheme, E\. P\. Wairagala, M\. U\. Nasir, B\. A\. Ajibade, T\. O\. Ajayi, Y\. W\. Gitau, J\. Abbott, M\. Ahmed, M\. Ochieng, A\. Aremu, P\. Ogayo, J\. Mukiibi, F\. Ouoba Kabore, G\. K\. Kalipe, D\. Mbaye, A\. A\. Tapo, V\. M\. Memdjokam Koagne, E\. Munkoh\-Buabeng, V\. Wagner, I\. Abdulmumin, A\. Awokoya, H\. Buzaaba, B\. Sibanda, A\. Bukula, and S\. Manthalu \(2022a\)A few thousand translations go a long way\! leveraging pre\-trained models for African news translation\.InProceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies,Seattle, United States,pp\. 3053–3070\.External Links:[Link](https://aclanthology.org/2022.naacl-main.223/),[Document](https://dx.doi.org/10.18653/v1/2022.naacl-main.223)Cited by:[§1](https://arxiv.org/html/2606.12708#S1.p2.1)\.
- D\. I\. Adelani, H\. Liu, X\. Shen, N\. Vassilyev, J\. O\. Alabi, Y\. Mao, H\. Gao, and E\. A\. Lee \(2024\)SIB\-200: a simple, inclusive, and big evaluation dataset for topic classification in 200\+ languages and dialects\.InProceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics \(Volume 1: Long Papers\),Y\. Graham and M\. Purver \(Eds\.\),St\. Julian’s, Malta,pp\. 226–245\.External Links:[Link](https://aclanthology.org/2024.eacl-long.14/),[Document](https://dx.doi.org/10.18653/v1/2024.eacl-long.14)Cited by:[§2](https://arxiv.org/html/2606.12708#S2.p1.1),[§4\.1](https://arxiv.org/html/2606.12708#S4.SS1.SSS0.Px2.p1.1)\.
- D\. I\. Adelani, G\. Neubig, S\. Ruder, S\. Rijhwani, M\. Beukman, C\. Palen\-Michel, C\. Lignos, J\. O\. Alabi, S\. H\. Muhammad, P\. Nabende, C\. M\. B\. Dione, A\. Bukula, R\. Mabuya, B\. F\. P\. Dossou, B\. Sibanda, H\. Buzaaba, J\. Mukiibi, G\. Kalipe, D\. Mbaye, A\. Taylor, F\. Kabore, C\. C\. Emezue, A\. Aremu, P\. Ogayo, C\. Gitau, E\. Munkoh\-Buabeng, V\. Memdjokam Koagne, A\. A\. Tapo, T\. Macucwa, V\. Marivate, E\. Mboning, T\. Gwadabe, T\. Adewumi, O\. Ahia, J\. Nakatumba\-Nabende, N\. L\. Mokono, I\. Ezeani, C\. Chukwuneke, M\. Adeyemi, G\. Q\. Hacheme, I\. Abdulmumin, O\. Ogundepo, O\. Yousuf, T\. Moteu Ngoli, and D\. Klakow \(2022b\)MasakhaNER 2\.0: Africa\-centric transfer learning for named entity recognition\.InProceedings of the 2022 Conference on Empirical Methods in Natural Language Processing,Abu Dhabi, United Arab Emirates,pp\. 4488–4508\.External Links:[Link](https://aclanthology.org/2022.emnlp-main.298/),[Document](https://dx.doi.org/10.18653/v1/2022.emnlp-main.298)Cited by:[§1](https://arxiv.org/html/2606.12708#S1.p2.1)\.
- D\. I\. Adelani, J\. Ojo, I\. A\. Azime, J\. Y\. Zhuang, J\. O\. Alabi, X\. He, M\. Ochieng, S\. Hooker, A\. Bukula, E\. A\. Lee, C\. I\. Chukwuneke, H\. Buzaaba, B\. K\. Sibanda, G\. K\. Kalipe, J\. Mukiibi, S\. Kabongo Kabenamualu, F\. Yuehgoh, M\. Setaka, L\. Ndolela, N\. Odu, R\. Mabuya, S\. Osei, S\. H\. Muhammad, S\. Samb, T\. K\. Guge, T\. V\. Sherman, and P\. Stenetorp \(2025\)IrokoBench: a new benchmark for African languages in the age of large language models\.InProceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies \(Volume 1: Long Papers\),L\. Chiruzzo, A\. Ritter, and L\. Wang \(Eds\.\),Albuquerque, New Mexico,pp\. 2732–2757\.External Links:[Link](https://aclanthology.org/2025.naacl-long.139/),[Document](https://dx.doi.org/10.18653/v1/2025.naacl-long.139),ISBN 979\-8\-89176\-189\-6Cited by:[§2](https://arxiv.org/html/2606.12708#S2.p1.1),[§3](https://arxiv.org/html/2606.12708#S3.p1.1)\.
- D\. I\. Adelani, D\. Ruiter, J\. O\. Alabi, D\. Adebonojo, A\. Ayeni, M\. Adeyemi, A\. E\. Awokoya, and C\. España\-Bonet \(2021b\)The effect of domain and diacritics in Yoruba–English neural machine translation\.InProceedings of Machine Translation Summit XVIII: Research Track,K\. Duh and F\. Guzmán \(Eds\.\),Virtual,pp\. 61–75\.External Links:[Link](https://aclanthology.org/2021.mtsummit-research.6/)Cited by:[§3](https://arxiv.org/html/2606.12708#S3.p3.1)\.
- Ž\. Agić \(2017\)Cross\-lingual parser selection for low\-resource languages\.InProceedings of the NoDaLiDa 2017 Workshop on Universal Dependencies \(UDW 2017\),M\. de Marneffe, J\. Nivre, and S\. Schuster \(Eds\.\),Gothenburg, Sweden,pp\. 1–10\.External Links:[Link](https://aclanthology.org/W17-0401/)Cited by:[§4\.3](https://arxiv.org/html/2606.12708#S4.SS3.p1.1),[§4\.3](https://arxiv.org/html/2606.12708#S4.SS3.p2.1)\.
- B\. Akera, J\. Mukiibi, L\. S\. Naggayi, C\. Babirye, I\. Owomugisha, S\. Nsumba, J\. Nakatumba\-Nabende, E\. Bainomugisha, E\. Mwebaze, and J\. Quinn \(2022\)Machine translation for african languages: community creation of datasets and models in uganda\.In3rd Workshop on African Natural Language Processing,Cited by:[§3](https://arxiv.org/html/2606.12708#S3.p3.1),[§4\.1](https://arxiv.org/html/2606.12708#S4.SS1.SSS0.Px1.p1.1)\.
- J\. O\. Alabi, D\. I\. Adelani, M\. Mosbach, and D\. Klakow \(2022\)Adapting pre\-trained language models to African languages via multilingual adaptive fine\-tuning\.InProceedings of the 29th International Conference on Computational Linguistics,N\. Calzolari, C\. Huang, H\. Kim, J\. Pustejovsky, L\. Wanner, K\. Choi, P\. Ryu, H\. Chen, L\. Donatelli, H\. Ji, S\. Kurohashi, P\. Paggio, N\. Xue, S\. Kim, Y\. Hahm, Z\. He, T\. K\. Lee, E\. Santus, F\. Bond, and S\. Na \(Eds\.\),Gyeongju, Republic of Korea,pp\. 4336–4349\.External Links:[Link](https://aclanthology.org/2022.coling-1.382/)Cited by:[§2](https://arxiv.org/html/2606.12708#S2.p1.1),[§4\.1](https://arxiv.org/html/2606.12708#S4.SS1.SSS0.Px2.p1.1)\.
- J\. O\. Alabi, M\. A\. Hedderich, D\. I\. Adelani, and D\. Klakow \(2025\)Charting the landscape of African NLP: mapping progress and shaping the road ahead\.InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing,Suzhou, China,pp\. 27807–27841\.External Links:[Link](https://aclanthology.org/2025.emnlp-main.1414/),[Document](https://dx.doi.org/10.18653/v1/2025.emnlp-main.1414),ISBN 979\-8\-89176\-332\-6Cited by:[§1](https://arxiv.org/html/2606.12708#S1.p2.1)\.
- E\. Aplonova and F\. M\. Tyers \(2017\)Towards a dependency\-annotated treebank for Bambara\.InProceedings of the 16th International Workshop on Treebanks and Linguistic Theories,Prague, Czech Republic,pp\. 138–145\.External Links:[Link](https://aclanthology.org/W17-7618/)Cited by:[§2](https://arxiv.org/html/2606.12708#S2.p1.1)\.
- C\. A\. Babou and M\. Loporcaro \(2016\)Noun classes and grammatical gender in wolof\.Journal of African Languages and Linguistics37\(1\),pp\. 1–57\.Cited by:[§3](https://arxiv.org/html/2606.12708#S3.p2.1)\.
- H\. Buzaaba, A\. Wettig, D\. I\. Adelani, and C\. Fellbaum \(2025\)Lugha\-llama: adapting large language models for african languages\.arXiv preprint arXiv:2504\.06536\.Cited by:[§2](https://arxiv.org/html/2606.12708#S2.p1.1)\.
- B\. Caron, M\. Courtin, K\. Gerdes, and S\. Kahane \(2019\)A surface\-syntactic ud treebank for naija\.InProceedings of the 18th International Workshop on Treebanks and Linguistic Theories \(TLT, SyntaxFest 2019\),pp\. 13–24\.Cited by:[§2](https://arxiv.org/html/2606.12708#S2.p1.1)\.
- A\. Conneau, K\. Khandelwal, N\. Goyal, V\. Chaudhary, G\. Wenzek, F\. Guzmán, E\. Grave, M\. Ott, L\. Zettlemoyer, and V\. Stoyanov \(2020\)Unsupervised cross\-lingual representation learning at scale\.InProceedings of the 58th Annual Meeting of the Association for Computational Linguistics,D\. Jurafsky, J\. Chai, N\. Schluter, and J\. Tetreault \(Eds\.\),Online,pp\. 8440–8451\.External Links:[Link](https://aclanthology.org/2020.acl-main.747/),[Document](https://dx.doi.org/10.18653/v1/2020.acl-main.747)Cited by:[§4\.1](https://arxiv.org/html/2606.12708#S4.SS1.SSS0.Px2.p1.1)\.
- J\. Devlin, M\. Chang, K\. Lee, and K\. Toutanova \(2019\)BERT: pre\-training of deep bidirectional transformers for language understanding\.InProceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 \(Long and Short Papers\),J\. Burstein, C\. Doran, and T\. Solorio \(Eds\.\),Minneapolis, Minnesota,pp\. 4171–4186\.External Links:[Link](https://aclanthology.org/N19-1423/),[Document](https://dx.doi.org/10.18653/v1/N19-1423)Cited by:[§4\.1](https://arxiv.org/html/2606.12708#S4.SS1.SSS0.Px2.p1.1)\.
- C\. M\. B\. Dione \(2019\)Developing universal dependencies for wolof\.InProceedings of the Third Workshop on Universal Dependencies \(UDW, SyntaxFest 2019\),pp\. 12–23\.Cited by:[§2](https://arxiv.org/html/2606.12708#S2.p1.1)\.
- C\. M\. B\. Dione, D\. I\. Adelani, P\. Nabende, J\. Alabi, T\. Sindane, H\. Buzaaba, S\. H\. Muhammad, C\. C\. Emezue, P\. Ogayo, A\. Aremu, C\. Gitau, D\. Mbaye, J\. Mukiibi, B\. Sibanda, B\. F\. P\. Dossou, A\. Bukula, R\. Mabuya, A\. A\. Tapo, E\. Munkoh\-Buabeng, V\. Memdjokam Koagne, F\. Ouoba Kabore, A\. Taylor, G\. Kalipe, T\. Macucwa, V\. Marivate, T\. Gwadabe, M\. T\. Elvis, I\. Onyenwe, G\. Atindogbe, T\. Adelani, I\. Akinade, O\. Samuel, M\. Nahimana, T\. Musabeyezu, E\. Niyomutabazi, E\. Chimhenga, K\. Gotosa, P\. Mizha, A\. Agbolo, S\. Traore, C\. Uchechukwu, A\. Yusuf, M\. Abdullahi, and D\. Klakow \(2023\)MasakhaPOS: part\-of\-speech tagging for typologically diverse African languages\.InProceedings of the 61st Annual Meeting of the Association for Computational Linguistics \(Volume 1: Long Papers\),Toronto, Canada,pp\. 10883–10900\.External Links:[Link](https://aclanthology.org/2023.acl-long.609/),[Document](https://dx.doi.org/10.18653/v1/2023.acl-long.609)Cited by:[§1](https://arxiv.org/html/2606.12708#S1.p2.1),[§2](https://arxiv.org/html/2606.12708#S2.p1.1),[§3\.2](https://arxiv.org/html/2606.12708#S3.SS2.p1.1),[§3](https://arxiv.org/html/2606.12708#S3.p1.1),[§3](https://arxiv.org/html/2606.12708#S3.p2.1),[§3](https://arxiv.org/html/2606.12708#S3.p3.1),[§4\.3](https://arxiv.org/html/2606.12708#S4.SS3.p2.1)\.
- T\. Dozat and C\. D\. Manning \(2016\)Deep biaffine attention for neural dependency parsing\.arXiv preprint arXiv:1611\.01734\.Cited by:[§4\.1](https://arxiv.org/html/2606.12708#S4.SS1.SSS0.Px2.p1.1)\.
- S\. Duan, H\. Zhao, and D\. Zhang \(2023\)Syntax\-aware data augmentation for neural machine translation\.IEEE/ACM Transactions on Audio, Speech, and Language Processing31,pp\. 2988–2999\.Cited by:[§1](https://arxiv.org/html/2606.12708#S1.p1.1)\.
- L\. Duong, T\. Cohn, S\. Bird, and P\. Cook \(2015\)Cross\-lingual transfer for unsupervised dependency parsing without parallel data\.InProceedings of the Nineteenth Conference on Computational Natural Language Learning,Beijing, China,pp\. 113–122\.External Links:[Link](https://aclanthology.org/K15-1012/),[Document](https://dx.doi.org/10.18653/v1/K15-1012)Cited by:[§4\.3](https://arxiv.org/html/2606.12708#S4.SS3.p1.1)\.
- H\. Fei, M\. Zhang, M\. Zhang, and T\. Chua \(2023\)Constructing code\-mixed universal dependency forest for unbiased cross\-lingual relation extraction\.InFindings of the Association for Computational Linguistics: ACL 2023,pp\. 9395–9408\.Cited by:[§1](https://arxiv.org/html/2606.12708#S1.p1.1)\.
- A\. Fisch, J\. Guo, and R\. Barzilay \(2019\)Working hard or hardly working: challenges of integrating typology into neural dependency parsers\.InProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing \(EMNLP\-IJCNLP\),K\. Inui, J\. Jiang, V\. Ng, and X\. Wan \(Eds\.\),Hong Kong, China,pp\. 5714–5720\.External Links:[Link](https://aclanthology.org/D19-1574/),[Document](https://dx.doi.org/10.18653/v1/D19-1574)Cited by:[§4\.3](https://arxiv.org/html/2606.12708#S4.SS3.p1.1)\.
- Gemini Team, Google DeepMind \(2026\)Gemini 3\.1 pro: technical report and model card\.Technical reportGoogle DeepMind\.Note:Accessed: 2026\-05\-16External Links:[Link](https://deepmind.google/models/model-cards/gemini-3-1-pro/)Cited by:[§4\.2](https://arxiv.org/html/2606.12708#S4.SS2.p1.2)\.
- K\. Gerdes, B\. Guillaume, S\. Kahane, and G\. Perrier \(2018\)SUD or surface\-syntactic Universal Dependencies: an annotation scheme near\-isomorphic to UD\.InProceedings of the Second Workshop on Universal Dependencies \(UDW 2018\),Brussels, Belgium,pp\. 66–74\.External Links:[Link](https://aclanthology.org/W18-6008/),[Document](https://dx.doi.org/10.18653/v1/W18-6008)Cited by:[§1](https://arxiv.org/html/2606.12708#S1.p3.1),[§1](https://arxiv.org/html/2606.12708#S1.p4.1),[§3\.1](https://arxiv.org/html/2606.12708#S3.SS1.1.6)\.
- K\. Gerdes, B\. Guillaume, S\. Kahane, and G\. Perrier \(2021\)Starting a new treebank? go sud\!\.InProceedings of the sixth international conference on dependency linguistics \(depling, syntaxfest 2021\),pp\. 35–46\.Cited by:[§1](https://arxiv.org/html/2606.12708#S1.p3.1)\.
- T\. Gowda, Z\. Zhang, C\. Mattmann, and J\. May \(2021\)Many\-to\-English machine translation tools, data, and pretrained models\.InProceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: System Demonstrations,H\. Ji, J\. C\. Park, and R\. Xia \(Eds\.\),Online,pp\. 306–316\.External Links:[Link](https://aclanthology.org/2021.acl-demo.37/),[Document](https://dx.doi.org/10.18653/v1/2021.acl-demo.37)Cited by:[§4\.1](https://arxiv.org/html/2606.12708#S4.SS1.SSS0.Px1.p1.1)\.
- E\. Grave, P\. Bojanowski, P\. Gupta, A\. Joulin, and T\. Mikolov \(2018\)Learning word vectors for 157 languages\.InProceedings of the Eleventh International Conference on Language Resources and Evaluation \(LREC 2018\),N\. Calzolari, K\. Choukri, C\. Cieri, T\. Declerck, S\. Goggi, K\. Hasida, H\. Isahara, B\. Maegaard, J\. Mariani, H\. Mazo, A\. Moreno, J\. Odijk, S\. Piperidis, and T\. Tokunaga \(Eds\.\),Miyazaki, Japan\.External Links:[Link](https://aclanthology.org/L18-1550/)Cited by:[§4\.1](https://arxiv.org/html/2606.12708#S4.SS1.SSS0.Px1.p1.1)\.
- G\. Guibon, M\. Courtin, K\. Gerdes, and B\. Guillaume \(2020\)When collaborative treebank curation meets graph grammars\.InProceedings of the Twelfth Language Resources and Evaluation Conference,N\. Calzolari, F\. Béchet, P\. Blache, K\. Choukri, C\. Cieri, T\. Declerck, S\. Goggi, H\. Isahara, B\. Maegaard, J\. Mariani, H\. Mazo, A\. Moreno, J\. Odijk, and S\. Piperidis \(Eds\.\),Marseille, France,pp\. 5291–5300\(eng\)\.External Links:[Link](https://aclanthology.org/2020.lrec-1.651/),ISBN 979\-10\-95546\-34\-4Cited by:[§3\.1](https://arxiv.org/html/2606.12708#S3.SS1.1.7)\.
- K\. Guiller \(2020\)Analyse syntaxique automatique du pidgin\-créole du nigeria à l’aide d’un transformer \(BERT\) : méthodes et résultats\.Master’s thesis,Sorbonne Nouvelle\.Cited by:[§3\.1](https://arxiv.org/html/2606.12708#S3.SS1.1.7)\.
- A\. Hurst, A\. Lerer, A\. P\. Goucher, A\. Perelman, A\. Ramesh, A\. Clark, A\. Ostrow, A\. Welihinda, A\. Hayes, A\. Radford,et al\.\(2024\)Gpt\-4o system card\.arXiv preprint arXiv:2410\.21276\.Cited by:[§4\.2](https://arxiv.org/html/2606.12708#S4.SS2.p1.2)\.
- O\. Ishola and D\. Zeman \(2020\)Yorùbá dependency treebank \(YTB\)\.InProceedings of the Twelfth Language Resources and Evaluation Conference,Marseille, France,pp\. 5178–5186\(eng\)\.External Links:[Link](https://aclanthology.org/2020.lrec-1.637/),ISBN 979\-10\-95546\-34\-4Cited by:[§2](https://arxiv.org/html/2606.12708#S2.p1.1)\.
- D\. Joshi and I\. Rekik \(2025\)Dependency parsing\-based syntactic enhancement of relation extraction in scientific texts\.InFindings of the Association for Computational Linguistics: EMNLP 2025,pp\. 24888–24897\.Cited by:[§1](https://arxiv.org/html/2606.12708#S1.p1.1)\.
- S\. Kahane, M\. Vanhove, R\. Ziane, and B\. Guillaume \(2021\)A morph\-based and a word\-based treebank for beja\.InProceedings of the 20th International Workshop on Treebanks and Linguistic Theories \(TLT, SyntaxFest\),pp\. 48–60\.Cited by:[§2](https://arxiv.org/html/2606.12708#S2.p1.1)\.
- O\. Kalejaiye, L\. H\. Beyene, D\. I\. Adelani, M\. G\. Edet, A\. D\. Akpan, E\. Urua, and A\. Andy \(2025\)Ibom NLP: a step toward inclusive natural language processing for Nigeria’s minority languages\.InProceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia\-Pacific Chapter of the Association for Computational Linguistics,K\. Inui, S\. Sakti, H\. Wang, D\. F\. Wong, P\. Bhattacharyya, B\. Banerjee, A\. Ekbal, T\. Chakraborty, and D\. P\. Singh \(Eds\.\),Mumbai, India,pp\. 372–382\.External Links:[Link](https://aclanthology.org/2025.ijcnlp-long.22/),[Document](https://dx.doi.org/10.18653/v1/2025.ijcnlp-long.22),ISBN 979\-8\-89176\-298\-5Cited by:[§4\.1](https://arxiv.org/html/2606.12708#S4.SS1.SSS0.Px1.p1.1)\.
- F\. Katambaet al\.\(2003\)Bantu nominal morphology\.The bantu languages103,pp\. 120\.Cited by:[§3](https://arxiv.org/html/2606.12708#S3.p2.1)\.
- L\. Li, K\. Fan, L\. Yang, H\. Li, and C\. Yuan \(2023\)Neural machine translation with dynamic graph convolutional decoder\.arXiv preprint arXiv:2305\.17698\.Cited by:[§1](https://arxiv.org/html/2606.12708#S1.p1.1)\.
- R\. Litschko, I\. Vulić, Ž\. Agić, and G\. Glavaš \(2020\)Towards instance\-level parser selection for cross\-lingual transfer of dependency parsers\.InProceedings of the 28th International Conference on Computational Linguistics,D\. Scott, N\. Bel, and C\. Zong \(Eds\.\),Barcelona, Spain \(Online\),pp\. 3886–3898\.External Links:[Link](https://aclanthology.org/2020.coling-main.345/),[Document](https://dx.doi.org/10.18653/v1/2020.coling-main.345)Cited by:[§4\.3](https://arxiv.org/html/2606.12708#S4.SS3.p1.1)\.
- L\. Liu, Y\. Zhou, J\. Xu, X\. Zheng, K\. Chang, and X\. Huang \(2020\)Cross\-lingual dependency parsing by POS\-guided word reordering\.InFindings of the Association for Computational Linguistics: EMNLP 2020,T\. Cohn, Y\. He, and Y\. Liu \(Eds\.\),Online,pp\. 2938–2948\.External Links:[Link](https://aclanthology.org/2020.findings-emnlp.265/),[Document](https://dx.doi.org/10.18653/v1/2020.findings-emnlp.265)Cited by:[§4\.3](https://arxiv.org/html/2606.12708#S4.SS3.p2.1)\.
- J\. Nivre, M\. de Marneffe, F\. Ginter, Y\. Goldberg, J\. Hajič, C\. D\. Manning, R\. McDonald, S\. Petrov, S\. Pyysalo, N\. Silveira, R\. Tsarfaty, and D\. Zeman \(2016\)Universal Dependencies v1: a multilingual treebank collection\.InProceedings of the Tenth International Conference on Language Resources and Evaluation \(LREC’16\),Portorož, Slovenia,pp\. 1659–1666\.External Links:[Link](https://aclanthology.org/L16-1262/)Cited by:[§1](https://arxiv.org/html/2606.12708#S1.p1.1)\.
- J\. Nivre, M\. de Marneffe, F\. Ginter, J\. Hajič, C\. D\. Manning, S\. Pyysalo, S\. Schuster, F\. Tyers, and D\. Zeman \(2020\)Universal Dependencies v2: an evergrowing multilingual treebank collection\.InProceedings of the Twelfth Language Resources and Evaluation Conference,Marseille, France,pp\. 4034–4043\(eng\)\.External Links:[Link](https://aclanthology.org/2020.lrec-1.497/),ISBN 979\-10\-95546\-34\-4Cited by:[§1](https://arxiv.org/html/2606.12708#S1.p1.1)\.
- D\. Nurse and G\. Philippson \(2003\)Towards a historical classification of the bantu languages\.The Bantu Languages,pp\. 164–181\.Cited by:[§3](https://arxiv.org/html/2606.12708#S3.p1.1)\.
- K\. Ogueji, Y\. Zhu, and J\. Lin \(2021\)Small data? no problem\! exploring the viability of pretrained multilingual language models for low\-resourced languages\.InProceedings of the 1st Workshop on Multilingual Representation Learning,Punta Cana, Dominican Republic,pp\. 116–126\.External Links:[Link](https://aclanthology.org/2021.mrl-1.11/),[Document](https://dx.doi.org/10.18653/v1/2021.mrl-1.11)Cited by:[§2](https://arxiv.org/html/2606.12708#S2.p1.1),[§3](https://arxiv.org/html/2606.12708#S3.p2.1),[§4\.1](https://arxiv.org/html/2606.12708#S4.SS1.SSS0.Px2.p1.1)\.
- S\. Petrov, D\. Das, and R\. McDonald \(2012\)A universal part\-of\-speech tagset\.InProceedings of the Eighth International Conference on Language Resources and Evaluation \(LREC’12\),Istanbul, Turkey,pp\. 2089–2096\.External Links:[Link](https://aclanthology.org/L12-1115/)Cited by:[§3\.1](https://arxiv.org/html/2606.12708#S3.SS1.1.6)\.
- P\. Qi, Y\. Zhang, Y\. Zhang, J\. Bolton, and C\. D\. Manning \(2020\)Stanza: a python natural language processing toolkit for many human languages\.InProceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations,A\. Celikyilmaz and T\. Wen \(Eds\.\),Online,pp\. 101–108\.External Links:[Link](https://aclanthology.org/2020.acl-demos.14/),[Document](https://dx.doi.org/10.18653/v1/2020.acl-demos.14)Cited by:[§4\.1](https://arxiv.org/html/2606.12708#S4.SS1.SSS0.Px1.p1.1)\.
- P\. Roulon\-Doko, S\. Kahane, and B\. Guillaume \(2025\)A morpheme\-based treebank for gbaya, an ubanguian language of central africa\.InProceedings of the Eighth International Conference on Dependency Linguistics \(Depling, SyntaxFest\),pp\. 93–102\.Cited by:[§2](https://arxiv.org/html/2606.12708#S2.p1.1)\.
- M\. Scholivet, F\. Dary, A\. Nasr, B\. Favre, and C\. Ramisch \(2019\)Typological features for multilingual delexicalised dependency parsing\.InProceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 \(Long and Short Papers\),J\. Burstein, C\. Doran, and T\. Solorio \(Eds\.\),Minneapolis, Minnesota,pp\. 3919–3930\.External Links:[Link](https://aclanthology.org/N19-1393/),[Document](https://dx.doi.org/10.18653/v1/N19-1393)Cited by:[§4\.3](https://arxiv.org/html/2606.12708#S4.SS3.p2.1)\.
- B\. E\. Seyoum, Y\. Miyao, and B\. Y\. Mekonnen \(2018\)Universal Dependencies for Amharic\.InProceedings of the Eleventh International Conference on Language Resources and Evaluation \(LREC 2018\),Miyazaki, Japan\.External Links:[Link](https://aclanthology.org/L18-1350/)Cited by:[§2](https://arxiv.org/html/2606.12708#S2.p1.1)\.
- G\. Team, A\. Kamath, J\. Ferret, S\. Pathak, N\. Vieillard, R\. Merhej, S\. Perrin, T\. Matejovicova, A\. Ramé, M\. Rivière, L\. Rouillard, T\. Mesnard, G\. Cideron, J\. Grill, S\. Ramos, E\. Yvinec, M\. Casbon, E\. Pot, I\. Penchev, G\. Liu, F\. Visin, K\. Kenealy, L\. Beyer, X\. Zhai, A\. Tsitsulin, R\. Busa\-Fekete, A\. Feng, N\. Sachdeva, B\. Coleman, Y\. Gao, B\. Mustafa, I\. Barr, E\. Parisotto, D\. Tian, M\. Eyal, C\. Cherry, J\. Peter, D\. Sinopalnikov, S\. Bhupatiraju, R\. Agarwal, M\. Kazemi, D\. Malkin, R\. Kumar, D\. Vilar, I\. Brusilovsky, J\. Luo, A\. Steiner, A\. Friesen, A\. Sharma, A\. Sharma, A\. M\. Gilady, A\. Goedeckemeyer, A\. Saade, A\. Feng, A\. Kolesnikov, A\. Bendebury, A\. Abdagic, A\. Vadi, A\. György, A\. S\. Pinto, A\. Das, A\. Bapna, A\. Miech, A\. Yang, A\. Paterson, A\. Shenoy, A\. Chakrabarti, B\. Piot, B\. Wu, B\. Shahriari, B\. Petrini, C\. Chen, C\. L\. Lan, C\. A\. Choquette\-Choo, C\. Carey, C\. Brick, D\. Deutsch, D\. Eisenbud, D\. Cattle, D\. Cheng, D\. Paparas, D\. S\. Sreepathihalli, D\. Reid, D\. Tran, D\. Zelle, E\. Noland, E\. Huizenga, E\. Kharitonov, F\. Liu, G\. Amirkhanyan, G\. Cameron, H\. Hashemi, H\. Klimczak\-Plucińska, H\. Singh, H\. Mehta, H\. T\. Lehri, H\. Hazimeh, I\. Ballantyne, I\. Szpektor, I\. Nardini, J\. Pouget\-Abadie, J\. Chan, J\. Stanton, J\. Wieting, J\. Lai, J\. Orbay, J\. Fernandez, J\. Newlan, J\. Ji, J\. Singh, K\. Black, K\. Yu, K\. Hui, K\. Vodrahalli, K\. Greff, L\. Qiu, M\. Valentine, M\. Coelho, M\. Ritter, M\. Hoffman, M\. Watson, M\. Chaturvedi, M\. Moynihan, M\. Ma, N\. Babar, N\. Noy, N\. Byrd, N\. Roy, N\. Momchev, N\. Chauhan, N\. Sachdeva, O\. Bunyan, P\. Botarda, P\. Caron, P\. K\. Rubenstein, P\. Culliton, P\. Schmid, P\. G\. Sessa, P\. Xu, P\. Stanczyk, P\. Tafti, R\. Shivanna, R\. Wu, R\. Pan, R\. Rokni, R\. Willoughby, R\. Vallu, R\. Mullins, S\. Jerome, S\. Smoot, S\. Girgin, S\. Iqbal, S\. Reddy, S\. Sheth, S\. Põder, S\. Bhatnagar, S\. R\. Panyam, S\. Eiger, S\. Zhang, T\. Liu, T\. Yacovone, T\. Liechty, U\. Kalra, U\. Evci, V\. Misra, V\. Roseberry, V\. Feinberg, V\. Kolesnikov, W\. Han, W\. Kwon, X\. Chen, Y\. Chow, Y\. Zhu, Z\. Wei, Z\. Egyed, V\. Cotruta, M\. Giang, P\. Kirk, A\. Rao, K\. Black, N\. Babar, J\. Lo, E\. Moreira, L\. G\. Martins, O\. Sanseviero, L\. Gonzalez, Z\. Gleicher, T\. Warkentin, V\. Mirrokni, E\. Senter, E\. Collins, J\. Barral, Z\. Ghahramani, R\. Hadsell, Y\. Matias, D\. Sculley, S\. Petrov, N\. Fiedel, N\. Shazeer, O\. Vinyals, J\. Dean, D\. Hassabis, K\. Kavukcuoglu, C\. Farabet, E\. Buchatskaya, J\. Alayrac, R\. Anil, Dmitry, Lepikhin, S\. Borgeaud, O\. Bachem, A\. Joulin, A\. Andreev, C\. Hardin, R\. Dadashi, and L\. Hussenot \(2025\)Gemma 3 technical report\.External Links:2503\.19786,[Link](https://arxiv.org/abs/2503.19786)Cited by:[§4\.2](https://arxiv.org/html/2606.12708#S4.SS2.p1.2)\.
- K\. Tran and A\. Bisazza \(2019\)Zero\-shot dependency parsing with pre\-trained multilingual sentence representations\.InProceedings of the 2nd Workshop on Deep Learning Approaches for Low\-Resource NLP \(DeepLo 2019\),C\. Cherry, G\. Durrett, G\. Foster, R\. Haffari, S\. Khadivi, N\. Peng, X\. Ren, and S\. Swayamdipta \(Eds\.\),Hong Kong, China,pp\. 281–288\.External Links:[Link](https://aclanthology.org/D19-6132/),[Document](https://dx.doi.org/10.18653/v1/D19-6132)Cited by:[§4\.3](https://arxiv.org/html/2606.12708#S4.SS3.p1.1)\.
- K\. Tulchynska, S\. Job, and A\. Witzlack\-Makarevich \(2025\)Universal Dependencies treebank for Khoekhoe \(KDT\)\.InProceedings of the Eighth Workshop on Universal Dependencies \(UDW, SyntaxFest 2025\),G\. Bouma and Ç\. Çöltekin \(Eds\.\),Ljubljana, Slovenia,pp\. 119–128\.External Links:[Link](https://aclanthology.org/2025.udw-1.12/),ISBN 979\-8\-89176\-292\-3Cited by:[§2](https://arxiv.org/html/2606.12708#S2.p1.1)\.
- T\. Wolf, L\. Debut, V\. Sanh, J\. Chaumond, C\. Delangue, A\. Moi, P\. Cistac, T\. Rault, R\. Louf, M\. Funtowicz, J\. Davison, S\. Shleifer, P\. von Platen, C\. Ma, Y\. Jernite, J\. Plu, C\. Xu, T\. Le Scao, S\. Gugger, M\. Drame, Q\. Lhoest, and A\. Rush \(2020\)Transformers: state\-of\-the\-art natural language processing\.InProceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations,Q\. Liu and D\. Schlangen \(Eds\.\),Online,pp\. 38–45\.External Links:[Link](https://aclanthology.org/2020.emnlp-demos.6/),[Document](https://dx.doi.org/10.18653/v1/2020.emnlp-demos.6)Cited by:[§4\.1](https://arxiv.org/html/2606.12708#S4.SS1.SSS0.Px2.p1.1)\.
- H\. Yu, J\. O\. Alabi, A\. Bukula, J\. Y\. Zhuang, E\. A\. Lee, T\. K\. Guge, I\. A\. Azime, H\. Buzaaba, B\. K\. Sibanda, G\. K\. Kalipe, J\. Mukiibi, S\. Kabongo Kabenamualu, M\. Setaka, L\. Ndolela, N\. Odu, R\. Mabuya, S\. H\. Muhammad, S\. Osei, S\. Samb, D\. Klakow, and D\. I\. Adelani \(2025\)INJONGO: a multicultural intent detection and slot\-filling dataset for 16 African languages\.InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics \(Volume 1: Long Papers\),W\. Che, J\. Nabende, E\. Shutova, and M\. T\. Pilehvar \(Eds\.\),Vienna, Austria,pp\. 9429–9452\.External Links:[Link](https://aclanthology.org/2025.acl-long.464/),[Document](https://dx.doi.org/10.18653/v1/2025.acl-long.464),ISBN 979\-8\-89176\-251\-0Cited by:[§2](https://arxiv.org/html/2606.12708#S2.p1.1)\.
- H\. Yu, T\. Xu, M\. A\. Hedderich, W\. Hamidouche, S\. W\. Zamir, and D\. I\. Adelani \(2026\)AfriqueLLM: how data mixing and model architecture impact continued pre\-training for african languages\.arXiv preprint arXiv:2601\.06395\.Cited by:[§2](https://arxiv.org/html/2606.12708#S2.p1.1)\.

## 8Appendix

## Appendix AAn example CoNLL\-U SUD annotation

[Table 4](https://arxiv.org/html/2606.12708#A1.T4)shows the CoNLL\-U format of the annotated Wolof sentence\. The token counts in Table[1](https://arxiv.org/html/2606.12708#S3.T1)include only regular CoNLL\-U syntactic tokens, i\.e\., lines with integer IDs in Appendix Table[4](https://arxiv.org/html/2606.12708#A1.T4)\. Multiword\-token lines, empty nodes, comments, and blank lines are excluded\. Thus, counts reflect the decomposed tokens used for dependency annotation rather than undecomposed surface forms\.

Table 4:CoNLL\-U/SUD annotation for the Wolof sentence"Askan wi dañoo war a sàkku ñu woo Karaa bala ñoo diig"\.\(‘The people must demand that Karaa be called in before the team goes under\.’\) UPOS tag colours are consistent with the dependency tree visualizations throughout the paper\.
## Appendix BCross\-lingual transfer across all source–target pairs

![Refer to caption](https://arxiv.org/html/2606.12708v1/x3.png)Figure 4:Cross\-lingual transfer across shot settings\.Average UAS and LAS of AfroXLMR\-large\-76L across source languages under 0\-, 1\-, and 2\-shot transfer\.[Table 5](https://arxiv.org/html/2606.12708#A2.T5)provides details of the cross\-lingual transfer between all source–target pairs\.

Table 5:UAS and LAS for cross\-lingual transfer across all source–target pairs using AfroXLMR\-Large\-76L\.Scores are percentages\. Avg reports the macro\-average across languages with mean per\-language standard deviation\. Bold indicates the best result per metric and target language\.
## Appendix CPOS, UAS, and LAS score for LLMS across all languages

[Table 6](https://arxiv.org/html/2606.12708#A3.T6)provides UAS and LAS score for all LLMS across all languages\.

Table 6:Few\-shot POS tagging and dependency parsing performance of LLMs across AfriSUD languages\.POS tagging accuracy, UAS, and LAS are reported for prompting models \(0/1/5\-shot\) and a supervised fine\-tuned model \(ft\)\. Each prompted cell is the mean over 5 runs,ftresults are the mean over 5 prompt templates\. Bold indicates the best result per language and metric\.
## Appendix DPOS Tags and Dependency relations

[Table 7](https://arxiv.org/html/2606.12708#A4.T7)shows the part\-of\-speech tags and dependency relations used in the annotation\.

TypeLabelSubtypeDescriptionUniversal POS tagsUPOSADJ–adjectiveUPOSADP–adpositionUPOSADV–adverbUPOSAUX–auxiliaryUPOSCCONJ–coordinating conjunctionUPOSDET–determinerUPOSINTJ–interjectionUPOSNOUN–nounUPOSNUM–numeralUPOSPART–particleUPOSPRON–pronounUPOSPROPN–proper nounUPOSPUNCT–punctuationUPOSSCONJ–subordinating conjunctionUPOSSYM–symbolUPOSVERB–verbUPOSX–otherSUD dependency relationsSUDroot–root of the sentenceSUDsubj–subjectSUDcompcomp:auxauxiliary complementSUDcompcomp:objobject complementSUDcompcomp:obloblique complementSUDcompcomp:predpredicative complementSUDcompcomp:cleftcleft complementSUDmod–modifierSUDudep–underspecified dependency used for cases ambiguous betweenmodandcomp:oblSUDcompoundcompoundregular compoundSUDcompoundcompound:prtverb\-particle compoundSUDcompoundcompound:svcserial verb compoundSUDappos–appositional modifierSUDconj–coordinate conjunctSUDcc–coordinating conjunctionSUDflat–name or flat expressionSUDfixed–fixed grammatical expressionSUDdislocated–dislocated elementSUDpunct–punctuationTable 7:Annotation labels used in AfriSUD, including UPOS tags and SUD dependency relations\.
## Appendix ELLM evaluation protocol

We formulate Surface\-Syntactic Universal Dependencies \(SUD\) annotation as a constrained JSON generation task\. Given a target\-language sentence and its gold token list \(token id and surface form only\), the model outputs a single JSON object containinglemma,upos,head, anddeprelfor each token\. No English translation or interlinear gloss is provided\.[Figure 5](https://arxiv.org/html/2606.12708#A5.F5)shows the exact prompts used for LLM parsing\.

System PromptYou are an expert linguist producing Surface\-Syntactic Universal Dependencies \(SUD\) annotations\. Return strict JSON only\. Do not use markdown\. Do not add extra text\.

User PromptTask: Annotate the provided target\-language sentence in Surface\-Syntactic Universal Dependencies \(SUD\)\. Rules:•Use only the provided token list \(id \+ form\)\.•Do not add, remove, merge, reorder, or split tokens\.•Keep punctuation tokens\.•Keep ids and forms exactly as provided\.•Provide one output token object per input token\.•headmust be an integer \(root=0\)\.•Use SUD\-style surface\-syntactic heads\.•Return exactly one JSON object matching the schema\.•Do not use translations or metadata not shown below\.SUD\-specific guidance:•copulas may serve as heads where required by SUD•adpositions/prepositions may serve as heads where required by SUD•auxiliaries may serve as heads according to SUD conventionsAdditional instruction:"gloss"is optional and may be omitted\.

Figure 5:Prompt template used for LLM\-based SUD annotation\.Table 8:Instruction templates used for SFT training\.All templates request token\-level CoNLL\-U output with no metadata\.
AfriSUD: A Dependency Treebank Collection for Evaluating Models on African Languages

Similar Articles

Afrispeech Semantics: Evaluating Audio Semantic Reasoning in Spoken Language Models Across Domains and Accents

Sample-Size Scaling of the African Languages NLI Evaluation

Meet UD_Czech-PDTC: A Large and Genre-Rich Treebank in Universal Dependencies

The African Language Tax: Quantifying the Cost, Latency, and Context Penalty of Tokenizing African Languages in Frontier LLMs

A Survey of Text and Speech Resources for Hausa and Fongbe: Availability, Quality, and Gaps for NLP Development

Submit Feedback

Similar Articles

Afrispeech Semantics: Evaluating Audio Semantic Reasoning in Spoken Language Models Across Domains and Accents
Sample-Size Scaling of the African Languages NLI Evaluation
Meet UD_Czech-PDTC: A Large and Genre-Rich Treebank in Universal Dependencies
The African Language Tax: Quantifying the Cost, Latency, and Context Penalty of Tokenizing African Languages in Frontier LLMs
A Survey of Text and Speech Resources for Hausa and Fongbe: Availability, Quality, and Gaps for NLP Development