Automatic Part-of-Speech Tagging of Arabic-English Dictionary Senses through WordNet

arXiv cs.CL 06/24/26, 04:00 AM Papers

natural-language-processing pos-tagging bilingual-dictionary wordnet arabic-english resource-light algorithm

Summary

This paper proposes a resource-light algorithm to automatically assign part-of-speech tags to senses in the Al-Mawrid Arabic-English bilingual dictionary by transferring tags from English WordNet after disambiguation, achieving high accuracy with minimal cost.

arXiv:2606.24359v1 Announce Type: new Abstract: This paper proposed an algorithm for part-of-speech (POS) tagging senses of a bilingual dictionary. The algorithm is applied on the Al-Mawrid Arabic-English dictionary. The tagging task is accomplished by transferring the POS tags of the English translation equivalences (TEs) to the dictionary senses after dis-ambiguities process. The English POS tags of senses are acquired from the Princeton WordNet. POS tagging of bilingual dictionary senses is prerequisite to link a bilingual dictionary to WordNet and/or standardizing that dictionary into WordNet-LMF format where the synset (set of synonyms), not word, is the basic brick. The registered accuracy is high though the cost is little. Building NLP/HLT tools needs linguistic experts, large investments, and long time. For statistical approach, we need large annotated corpora and for rule-based approach, we need large lexicon that contains rich linguistic and world knowledge. That motivates the appearance of what are called resource-light approaches to develop natural language processing (NLP) tools for poor-resource languages.

Original Article

View Cached Full Text

Cached at: 06/24/26, 07:46 AM

# Automatic Part-of-Speech Tagging of Arabic-English Dictionary Senses through WordNet
Source: [https://arxiv.org/abs/2606.24359](https://arxiv.org/abs/2606.24359)
[View PDF](https://arxiv.org/pdf/2606.24359)

> Abstract:This paper proposed an algorithm for part\-of\-speech \(POS\) tagging senses of a bilingual dictionary\. The algorithm is applied on the Al\-Mawrid Arabic\-English dictionary\. The tagging task is accomplished by transferring the POS tags of the English translation equivalences \(TEs\) to the dictionary senses after dis\-ambiguities process\. The English POS tags of senses are acquired from the Princeton WordNet\. POS tagging of bilingual dictionary senses is prerequisite to link a bilingual dictionary to WordNet and/or standardizing that dictionary into WordNet\-LMF format where the synset \(set of synonyms\), not word, is the basic brick\. The registered accuracy is high though the cost is little\. Building NLP/HLT tools needs linguistic experts, large investments, and long time\. For statistical approach, we need large annotated corpora and for rule\-based approach, we need large lexicon that contains rich linguistic and world knowledge\. That motivates the appearance of what are called resource\-light approaches to develop natural language processing \(NLP\) tools for poor\-resource languages\.

## Submission history

From: Diaa Fayed \[[view email](https://arxiv.org/show-email/94f85ae8/2606.24359)\] **\[v1\]**Tue, 23 Jun 2026 09:49:26 UTC \(629 KB\)

Automatic Part-of-Speech Tagging of Arabic-English Dictionary Senses through WordNet

Similar Articles

Towards Structuring an Arabic-English Machine-Readable Dictionary Using Parsing Expression Grammars

Analyzing and Encoding the Al-Mawrid Arabic-English Dictionary with the ISO Language Markup Framework and TEI Lex-0

Automated Scoring of Arabic Text Using Large Language Models: A Literature Review

MentalMARBERT: Domain-Adaptive Pre-training and Two-Stage Fine-Tuning for Arabic Mental Health Disorders Detection

Linear Semantic Segmentation for Low-Resource Spoken Dialects

Submit Feedback

Similar Articles

Towards Structuring an Arabic-English Machine-Readable Dictionary Using Parsing Expression Grammars

Analyzing and Encoding the Al-Mawrid Arabic-English Dictionary with the ISO Language Markup Framework and TEI Lex-0

Automated Scoring of Arabic Text Using Large Language Models: A Literature Review

MentalMARBERT: Domain-Adaptive Pre-training and Two-Stage Fine-Tuning for Arabic Mental Health Disorders Detection

Linear Semantic Segmentation for Low-Resource Spoken Dialects