KannadaLex: A lexical database with psycholinguistic information

Document Type

Article

Publication Title

ACM Transactions on Asian and Low-Resource Language Information Processing

Abstract

Databases containing lexical properties are of primary importance to psycholinguistic research and speech-language therapy. Several lexical databases for different languages have been developed in the recent past, but Kannada, a language spoken by 50.8 million people, has no comprehensive lexical database yet. To address this, KannadaLex, a Kannada lexical database, is built as a language resource that contains orthographic, phonological, and syllabic information about words that are sourced from newspaper articles from the past decade. Along with these vital statistics such as the phonological neighborhood, syllable complexity summed syllable and bigram syllable frequencies, and lemma and inflectional family information are stored. The database is validated by correlating frequency, a well-established psycholinguistic feature, with other numerical features. The developed lexical database contains 170K words from varied disciplines, complete with psycholinguistic features. This KannadaLex is a comprehensive resource for psycholinguists, speech therapists, and linguistic researchers for analyzing Kannada and other similar languages. Psycholinguists require lexical data for choosing stimuli to conduct experiments that study the factors that enable humans to acquire, use, comprehend, and produce language. Speech and language therapists query these databases for developing the most efficient stimuli for evaluating, diagnosing, and treating communication disorders, and rehabilitation of speech after brain injuries.

DOI

10.1145/3670688

Publication Date

7-12-2024

This document is currently not available here.

Share

COinS