Developing a Hybrid Morphological Analyzer for Low-Resource Languages
Document Type
Article
Publication Title
Applied Sciences Switzerland
Abstract
Morphological analysis is the fundamental and preliminary task for Natural Language Processing (NLP) applications, which involve speech and language. Kannada is a low-resource language belonging to the Dravidian language family, which is highly agglutinative and morphologically rich in nature, where dataset development is happening rapidly due to the increasing demands of NLP tools. This study presents a hybrid approach that integrates rule-based and Transformer-based techniques, aiming to maximize their strengths while minimizing the respective limitations. In the Kannada language, the analysis of inflections has been challenging due to morphological richness, and to address this issue, 85 paradigms are created using Lttoolbox of Apertium. Further, a Transformer model is trained with the generated nominal data to generate the morphological analysis for the out-of-vocabulary inflections. The hybrid approach can be easily extended to new words as they are added to the dictionary. The obtained results are on a test set for inflections in Kannada precision: 0.924; recall: 0.925; and F1 score: 0.925. The main contributions include rule extraction for paradigm design at the word level, morphological analysis for nouns, verbs, adjectives, pronouns, and indeclinables on a benchmark dataset and morphological analysis generation using the Transformer architecture.
DOI
10.3390/app15105682
Publication Date
5-1-2025
Recommended Citation
Supriya, Musica; Acharya Udupi, Dinesh; Nayak, Ashalatha; and Srirangapatna Raghavendra, Arjuna, "Developing a Hybrid Morphological Analyzer for Low-Resource Languages" (2025). Open Access archive. 13261.
https://impressions.manipal.edu/open-access-archive/13261