Towards Structuring an Arabic-English Machine-Readable Dictionary Using Parsing Expression Grammars
Summary
This paper presents a method to structure the Arabic-English Al-Mawrid dictionary using parsing expression grammars, converting entries into hierarchical structures for NLP applications.
View Cached Full Text
Cached at: 06/25/26, 05:10 AM
# Towards Structuring an Arabic-English Machine-Readable Dictionary Using Parsing Expression Grammars Source: [https://arxiv.org/abs/2606.25231](https://arxiv.org/abs/2606.25231) [View PDF](https://arxiv.org/pdf/2606.25231) > Abstract:Dictionaries are rich sources of lexical information about words that is required for many applications of natural language processing and human language technology\. However, publishers prepare printed dictionaries for human usage not for machine processing\. This paper presented a method to structure partly a machine\-readable version of the Arabic\-English Al\-Mawrid dictionary\. The method converted the entries of Al\-Mawrid from a stream of words and punctuation marks into hierarchical structures\. The hierarchical structure expresses the components of each dictionary entry in explicit format\. A dictionary entry is composed of subentries and each subentry consists of defining phrases, domain labels, cross\-references, and translation equivalences\. We designed the proposed method as cascaded steps where parsing is the main step\. We implemented the parser using the parsing expression grammars formalism\. In conclusion, although Arabic dictionaries do not have microstructure standardization, this study demonstrated that it is possible to structure them automatically or semi\-automatically with plausible accuracy after inducing their microstructure\. ## Submission history From: Diaa Fayed \[[view email](https://arxiv.org/show-email/e2d7c6dc/2606.25231)\] **\[v1\]**Tue, 23 Jun 2026 23:17:51 UTC \(1,029 KB\)
Similar Articles
Analyzing and Encoding the Al-Mawrid Arabic-English Dictionary with the ISO Language Markup Framework and TEI Lex-0
This paper presents a methodology for digitizing the Al-Mawrid Arabic-English dictionary using ISO LMF and TEI Lex-0 standards, achieving high parsing accuracy and precision, and addressing gaps in Arabic lexical infrastructure.
Automatic Part-of-Speech Tagging of Arabic-English Dictionary Senses through WordNet
This paper proposes a resource-light algorithm to automatically assign part-of-speech tags to senses in the Al-Mawrid Arabic-English bilingual dictionary by transferring tags from English WordNet after disambiguation, achieving high accuracy with minimal cost.
ArabiGEE: A Hierarchical Taxonomy for Arabic Grammatical Error Explanation
Introduces ArabiGEE, the first comprehensive Arabic grammatical error explanation taxonomy with a hierarchical structure spanning orthographic, morphological, syntactic, and lexical dimensions, comprising 27 error types, 140 correction types, and 324 explanations.
Automated Scoring of Arabic Text Using Large Language Models: A Literature Review
A literature review examining LLM-based approaches for automatic scoring of Arabic text, covering short answer grading and essay scoring, with a proposed taxonomy and comparative analysis.
Building Arabic NLP from the Ground Up: Twenty Years of Lessons, Failures, and Open Problems
A comprehensive overview of twenty years of Arabic NLP research, discussing lessons, failures, and open problems in the field.