Tag
This paper presents QuechuaTok, a benchmark for evaluating tokenization strategies for Southern Quechua, and introduces Morphological Boundary Accuracy (MorphAcc) as a necessary metric. It shows that BPE achieves low fertility but poor morphological accuracy, while a morphology-aware PRPE tokenizer achieves 83% MorphAcc, demonstrating that fertility rate alone is insufficient for agglutinative languages.
This paper presents a deep learning-based chatbot system for answering frequently asked questions in the Amharic language at universities, achieving 91.55% accuracy using neural networks with TensorFlow and Keras. The system addresses Amharic-specific linguistic challenges including morphological variation and lexical gaps, and was deployed on Facebook Messenger via Heroku.