Skip to content

My repo "Advancements in Manipuri NLP" will provide a comprehensive study of research papers regarding Natural Language Processing (NLP) applications and developments of Manipuri language.

Notifications You must be signed in to change notification settings

galax19ksh/Advancements-in-Manipuri-NLP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

57 Commits
 
 
 
 

Repository files navigation

Advancements in NLP for Manipuri Language

This repository "Advancements in Manipuri NLP" will provide a comprehensive study of research papers regarding Natural Language Processing (NLP) applications and developments of Manipuri language. It could be regarded as one of those "awesome" github repos for curated stuffs.

Note: This is a work in progress. I will be updating every now and then. Suggestions are always welcome! IF YOU ARE WILLING TO CONTRIBUTE TO THIS REPO OR COLLABORATE ON Manipuri-NLP research IN GENERAL, PLEASE FEEL FREE TO REACH ME AT galaxkshetrimayum16@gmail.com

Introduction

Manipuri (Meiteilon), a language of Tibo-Burman origin is the official language and lingua franca of the northeast Indian state of Manipur. Despite its status as one of India's 22 Scheduled languages and a communication tool for over 1.5 million speakers across states like Manipur, Assam and Tripura, Manipuri remains a low resource language, impeded by sparse annotated data and technological advancements. While Manipuri boasts a rich morphology, complex agglutinative structure, and unique SOV word order, it also presents a thrilling challenge for NLP researchers. This is further compounded by its monosyllabic and compounding nature. Despite these complexities, researchers have made commendable strides in NLP applications like Part-of-Speech tagging, Name Entity Recognition, and machine translation.

Interestingly, Manipuri uses two writing systems: the borrowed Bengali script and its own indigenous Meitei Mayek. While most research has focused on Bengali script, Meitei Mayek holds immense potential for future exploration. It's clear that significant progress is needed for both scripts to unlock the full potential of Manipuri NLP. This github repo will serve as the ultimate destination for the analysis of the NLP applications, approaches, challenges and future directions of Manipuri. I will post all papers and updates there has been in this realm with a summary and analysis.

Index

  1. Morphological Analysis
  2. Syllabification, Stemming, Chunking
  3. POS Tagging
  4. Named Entity Recognition
  5. Word Sense Disambiguation
  6. RMWE
  7. Corpus Creation & E-Dictionary
  8. Parsing
  9. Machine Translation
  10. Transliteration
  11. Sentiment Analysis
  12. Speech Technologies

Morphological Analysis

Author & Date Paper Summary
Choudhury et al.,2004 Morphological Analyzer for Manipuri: Design and Implementation This paper implements a Manipuri morphological analyzer by employing Morphographemics and Morphotactics, alongside a model addressing orthographic variations and morphosyntactic feature combinations.
Singh et al.,2005 Manipuri Morphological Analyzer This paper utilizes a Manipuri-English dictionary for identifying word morphemes and an affix dictionary for categorizing affix types.
Singh et al.,2006 Word Class and Sentence Type Identification in Manipuri Morophological Analyzer The developed Manipuri Morphological Analyzer, utilizing a Manipuri-English dictionary for root words and an affix dictionary for affix types, yields surface-level word analysis aiding in the development of a Manipuri-English machine translation system, showcasing promising results.
Nongmeikapam et al.,2012 Manipuri Morpheme Identification This approach used involves segmenting Manipuri words into syllables and employing 2-gram analysis with Standard Deviation technique for morpheme identification, achieving recall of 59.80%, precision of 83.02%, and an f-score of 69.52%.
Singha et al.,2012 Morphological Analysis for Manipuri Nominal Category Words with Finite State Techniques The paper proposes a suffix stripping approach for analyzing nominal category Manipuri words, utilizing finite state machines to model morphotactics and converting non-deterministic finite automata to deterministic finite automata, facilitating morphological analysis without a lexicon.
Singha et al.,2013 Morphotactics of Manipuri Adjectives: A FiniteState Approach This paper presents a constrained finite-state model to represent the morphotactic rule of Manipuri adjective word forms, which are derived from verb roots using specific affixes, with rules composed to describe their simple agglutinative morphology and more complex structures, resulting in a system capable of analyzing and recognizing adjectives through finite-state networks, utilizing a root lexicon and an affix dictionary.
Devi et al.,2015 Manipuri morphological generator The paper presents a morphological generator for Manipuri, focusing on inflecting root words to cater to the complex morphology of Manipuri nouns and verbs, with coverage directly tied to the size of the root list.
Bablu et al.,2017 Morphological Analysis of Manipuri Language The Morphological Analyzer for Manipuri applies computational morphology principles to analyze Manipuri word forms, achieving an 80% accuracy rate when tested on 3500 Manipuri words in Shakti Standard format using Meitei Mayek script as a source.
Devi et al.,2020 Morphotactics of Manipuri Verbs: A Finite State Approach The paper investigates the morphotactics of Manipuri verbs using a Finite State Approach, crucial for understanding word formation in this language, where verbs serve as the main morphology and all other word forms are derived from them through affixation, providing essential insights for Natural Language Processing applications
Bablu et al.,2020 Manipuri Morphological Analysis The Morphological Analyzer for Manipuri language, tested on 4500 Manipuri lexicons using Meitei Mayek Unicode as a source, achieves an 84% accuracy rate, providing valuable grammatical information associated with the lexicon for Natural Language Processing applications.
Devi et al.,2022 Allomorphs in Meeteilon (Manipuri) Morphology The paper focuses on studying the distribution of phonologically conditioned allomorphs in Meeteilon morphology to understand its morphosyntactic nature, facilitating morpheme segmentation, identification, and parts of speech tagging for natural language processing, alongside an introduction to an optimality theory approach for syllable final devoicing

Go to top

Syllabification, Stemming, Chunking

Author & Date Paper Summary
Nongmeikapam et al.,2012 Automatic Segmentation of Manipuri (Meiteilon) Word into Syllabic Units This paper presents an algorithmic approach for automatic segmentation of Manipuri language words into syllabic units, achieving a Recall of 74.77, Precision of 91.21, and F-Score of 82.18.
Nandakishor et al.,2015 An HMM based semi-automatic syllable labeling system for Manipuri language This paper develops a Semi-Automatic Syllable Labeling System for Manipuri, utilizing HMM toolkit (HTK) and WaveSurfer, achieving an average deviation of 25 ms and employing detection rates based on time deviations for syllable segmentation.
Gyanendro et al.,2016 Automatic Syllabification for Manipuri language This paper introduces a data-driven method for automatic syllabification by employing entropy-based phonotactic segmentation, sequence labeling approaches, and a hybrid method, achieving up to 98% word accuracy.
Devi et al.,2017 Automatic Syllabification Rules for Manipuri Language This paper introduces an algorithmic approach for automatic syllabification of the language, achieving 99.8% accuracy compared to manual syllabification, crucial for tasks like text-to-speech conversion and speech recognition.
Nongmeikapam et al.,2014 Chunking in Manipuri Using CRF The paper presents a chunking approach for Manipuri language utilizing Conditional Random Field (CRF) for Part of Speech (POS) tagging, achieving a recall of 71.30%, precision of 77.36%, and F-measure of 74.21%.
Nongmeikapam et al.,2014 Manipuri Chunking: An Incremental Model with POS and RMWE This paper utilizes Support Vector Machine (SVM) for chunking, incorporating Part of Speech (POS) tagging and Reduplicated Multiword Expression (RMWE) features, achieving a final chunking with a recall of 70.45%, precision of 86.11%, and F-measure of 77.50%
Meetei et al.,2015 Development of a Manipuri stemmer: A hybrid approach This paper introduces a brute force stemming algorithm for Manipuri, incorporating a suffix stripping technique, crucial for enhancing information retrieval systems in the Manipuri language domain.

Go to top

POS Tagging

Author & Date Paper Summary
Doren et al.,2008 Morphology Driven Manipuri POS Tagger This paper introduces a morphology-driven POS tagger for Manipuri language, utilizing dictionaries of root words, prefixes, and suffixes to achieve an accuracy of 69% on 3784 sentences containing 10917 unique words.
Doren et al.,2008 Manipuri POS Tagging using CRF and SVM: A Language Independent Approach This improves the performance of the above Manipuri POS tagger using Conditional Random Field (CRF) and Support Vector Machine (SVM), achieving accuracies of 72.04% and 74.38% respectively.
Nongmeikapam et al.,2010 CRF Based POS Tagging of Manipuri This CRF approach achieves a recall of 70.00%, precision of 77.78%, and F-measure of 73.68%
Singha et al.,2012 Part of Speech Tagging in Manipuri: A Rule-based Approach This paper presents a rule-based Part of Speech (POS) tagger for Manipuri, employing hand-written linguistic rules and affix stripping technique to handle the challenges of classifying the lexical categories.
Nongmeikapam et al.,2012 Improvement of CRF Based Manipuri POS Tagger by Using Reduplicated MWE (RMWE) This paper presents a modified feature selection approach for Conditional Random Field (CRF) based Manipuri Part of Speech (POS) tagging, achieving improved performance with a recall of 80.20%, precision of 74.31%, and F-measure of 77.14% by incorporating Reduplicated Multiword Expression (RMWE) as an additional feature.
Nongmeikapam et al.,2012 A Transliteration of CRF based Manipuri POS Tagging This paper employs Conditional Random Field (CRF) for Part of Speech (POS) tagging of Bengali Script Manipuri text, followed by transliteration to Meitei Mayek script.
Nongmeikapam et al.,2012 Transliterated SVM Based Manipuri POS Tagging This presents a Support Vector Machine (SVM) approach for Part of Speech (POS) tagging of Bengali Script Manipuri text, followed by transliteration to Meitei Mayek.
Nongmeikapam et al.,2012 SVM based Manipuri POS tagging using SVM based identified reduplicated MWE (RMWE) This paper employs Support Vector Machine (SVM) for identifying Reduplicated Multiword Expressions (RMWE) in Manipuri, achieving a recall of 86.11%, precision of 92.08%, and F-measure of 88.99%, and subsequently utilizes these identified RMWE as features in SVM-based POS tagging, yielding a recall of 71.15%, precision of 83.15%, and F-measure of 76.68%.
Nongmeikapam et al.,2012 Will the Identification of Reduplicated Multiword Expression (RMWE) Improve the Performance of SVM Based Manipuri POS Tagging? This inspects a possible performance improvement of SVM based Manipuri POS tagging by incorporating identified Reduplicated Multiword Expressions (RMWEs) as an additional feature, resulting in improved performance with an F-Score increase from 77.67% to 79.61%.
Singha et al.,2012 Part of Speech Tagging in Manipuri with Hidden Markov Model This paper employs a stochastic model, Hidden Markov Model, for Part of Speech Tagging in Manipuri, utilizing the tagged output of the Manipuri rule-based tagger as the tagged corpus.

Go to top

Named Entity Recognition

Author & Date Paper Summary
Doren et al.,2009 Named Entity Recognition for Manipuri Using Support Vector Machine This paper presents the development of a Manipuri Named Entity Recognition (NER) system utilizing Support Vector Machine (SVM) and active learning techniques, achieving an overall average Recall of 93.91%, Precision of 95.32%, and F-Score of 94.59%.
Doren et al.,2010 Web Based Manipuri Corpus for Multiword NER and Reduplicated MWEs Identification using SVM This paper describes the development of a web-based Manipuri corpus for identifying reduplicated multiword expressions (MWE) and multiword named entities (NE) using a Support Vector Machine (SVM) learning technique, achieving recall, precision, and F-score values of 94.62%, 93.53%, and 94.07% respectively for reduplicated MWEs, and 94.82%, 93.12%, and 93.96% respectively for multiword NE.
Nongmeikapam et al.,2011 CRF based Name Entity Recognition (NER) in Manipuri: A highly agglutinative Indian Language This paper employs Conditional Random Field (CRF) for Manipuri Name Entity Recognition (NER), achieving a Recall of 81.12%, Precision of 85.67%, and F-Score of 83.33%.
Jimmy et al.,2013 Named Entity Recognition in Manipuri: A Hybrid Approach This paper introduces a hybrid approach to Named Entity Recognition (NER) in Manipuri language, combining Conditional Random Field (CRF) statistical approach with rule-based techniques, achieving Recall, Precision, and F-score of 92.26%, 94.27%, and 93.3% respectively.
Jimmy et al.,2020 Deep Neural Model for Manipuri Multiword Named Entity Recognition with Unsupervised Cluster Feature This paper presents an approach for recognizing Multi-Word Named Entities (MNEs) in Manipuri using a Long Short Term Memory (LSTM) recurrent neural network model augmented with Part Of Speech (POS) embeddings and word cluster information obtained through K-means clustering, demonstrating performance comparison with other machine learning-based models.
Jimmy et al.,2022 BiLSTM-CRF Manipuri NER with Character-Level Word Representation This paper proposes a Manipuri Named Entity Recognition (NER) model employing Bidirectional Long Short Term Memory (BiLSTM) deep neural network with character-level word representation and word embedding, augmented by a Conditional Random Field (CRF) classifier, achieving an F-Score measure of approximately 98.19% with RMSprop Gradient Descent (GD) optimizer, and an average clustering accuracy of 88.14% for all NE classes.

Go to top

Word Sense Disambiguation

Author & Date Paper Summary
Singh et al.,2014 Word Sense Disambiguation This paper introduces a word sense disambiguation system for Manipuri language, employing conventional positional and context-based features to predict the senses of polysemous words with an accuracy of 71.75%.

Go to top

RMWE

Author & Date Paper Summary
Nongmeikapam et al.,2011 Identification of Reduplicated MWEs in Manipuri: A Rule Based Approach This paper developes a rule-based model to identify reduplicated Multiword Expressions (MWEs) in Manipuri language texts, achieving an overall average Recall of 94.24%, Precision of 82.27%, and F-Score of 87.68%.
Nongmeikapam et al.,2010 Identification of MWEs Using CRF in Manipuri and Improvement Using Reduplicated MWEs This paper employs Conditional Random Field (CRF) machine learning techniques for identifying Multiword Expressions (MWE) in Manipuri text, achieving a recall of 62.24%, precision of 86.06%, and F-measure of 72.24% after accounting for reduplicated MWEs.
Doren et al.,2010 Web Based Manipuri Corpus for Multiword NER and Reduplicated MWEs Identification using SVM The paper presents a novel approach utilizing support vector machine (SVM) learning technique for identifying reduplicated multiword expressions (MWE) and multiword named entities (NER) in a web-based Manipuri corpus, achieving recall, precision, and F-score values of 94.62%, 93.53%, and 94.07% respectively for reduplicated MWE.
Nongmeikapam et al.,2011 Transliteration of CRF Based Multiword Expression (MWE) in Manipuri The study focuses on the transliteration of identified Multiword Expressions (MWE) in Manipuri using Conditional Random Field (CRF), achieving a recall of 64.08%, precision of 86.84%, and F-measure of 73.74%, with an accuracy of 90.01% when comparing the transliterated output with both Meitei Script and Bengali Script Manipuri.
Nongmeikapam et al.,2011 Identification of Reduplicated Multiword Expressions Using CRF The study focuses on the identification of Reduplicated Multiword Expressions (RMWEs) in Manipuri language texts using Conditional Random Field (CRF) tool, achieving overall average recall, precision, and F-score values of 92.91%, 91.90%, and 92.40% respectively.
Nongmeikapam et al.,2011 Genetic Algorithm (GA) in Feature Selection for CRF Based Manipuri Multiword Expression (MWE) Identification The paper presents a feature selection approach using Genetic Algorithm (GA) to enhance the identification of Multiword Expressions (MWEs) in Manipuri using Conditional Random Field (CRF), achieving a recall of 64.08%, precision of 86.84%, and F-measure of 73.74%, demonstrating improvement over CRF-based MWE identification.

Go to top

Corpus Creation and E-Dictionary

Author & Date Paper Summary
Doren et al.,2010 Semi–Automatic Parallel Corpora Extraction from Comparable News Corpora The paper introduces a technique for extracting parallel corpus between Manipuri and English from web-collected news corpora, leveraging morphological information to improve alignment quality, thus demonstrating effectiveness for resource-constrained, agglutinative, and inflective Indian languages.
Doren et al.,2012 Building Parallel Corpora for SMT System: A Case Study of English-Manipuri The paper presents a technique for extracting parallel corpus between Manipuri and English from web-based comparable news corpora to improve translation quality in Statistical Machine Translation (SMT) systems for resource-constrained language pairs.
Moirangthem et al.,2022 Embeddings-Based Parallel Corpus Creation for English-Manipuri The paper introduces an efficient English-Manipuri automatic sentence aligner based on embeddings to create an English-Manipuri parallel corpus, reducing manual alignment effort by 47.72% and facilitating neural machine translation in Low-Resource languages.
Huidrom et al.,2021 EM Corpus: a comparable corpus for a less-resourced language pair Manipuri-English The paper presents a sentence-level comparable text corpus for the Manipuri-English language pair, consisting of 1.88 million Manipuri sentences, 1.45 million English sentences, and 124,975 Manipuri-English sentence pairs crawled from 'The Sangai Express' website, aimed at supporting MT/NLP tasks for low-resourced languages.
Laitonjam et al.,2022 Manipuri–English comparable corpus for cross-lingual studies The paper introduces Mni-EnCC, a Manipuri–English comparable corpus, created by collating text from Sangai Express and Poknapham news sources, and verified through a semi-automated process, aiming to facilitate cross-lingual studies between Manipuri and English languages.
Ningombam et al.,2011 Building Manipuri-English Machine Readable Dictionary by Implementing Ontology The paper outlines the development process of a Manipuri-English machine-readable dictionary using ontology, aiming to provide an effective combination of traditional bilingual lexicographic information and conceptual knowledge essential for Natural Language Processing applications.
Meitei et al.,2012 An Analysis towards the Development of Electronic Bilingual Dictionary(Manipuri-English)-A Report The paper outlines the ongoing development of an Electronic Manipuri Bilingual dictionary, aiming to provide a more effective combination of traditional English-Manipuri bilingual lexicographic information and conceptual knowledge essential for Natural Language Processing applications.
Meitei et al.,2012 Word Search in a WWW Manipuri-English Electronic Dictionary The paper introduces an online Manipuri-English bilingual dictionary, emphasizing its word search functionality categorized into simple word search, wild card search, and search by lexical item, thereby contributing to language learning and natural language processing.
Meitei et al.,2017 DEVELOPMENT OF ENGLISH TO MANIPURI ELECTRONIC DICTIONARY: A database approach The paper outlines the development of an electronic English to Manipuri dictionary based on a database model, providing an advanced alternative to traditional paper dictionaries for language learning and accessibility across various digital platforms.
Singh et al.,2017 Corpus & Wordnet Based MMD (Multilingual Manipuri Dictionary) The paper describes the development of MMD (Multilingual Manipuri Dictionary), employing a trie (M-ary tree) data structure, to facilitate language learning and various Natural Language Processing (NLP) tasks, including machine translation and corpus-based language processing.

Go to top

Parsing

Author & Date Paper Summary
Nirmal et al.,2018 Problems and Issues in Parsing Manipuri Text The paper addresses the parsing challenges encountered in Manipuri text, highlighting lexical and attachment ambiguities, as well as word order variations, crucial for developing parsing systems in low-resource languages like Manipuri.
Nirmal et al.,2019 A Grammar-Driven Approach for Parsing Manipuri Language The paper employs context-free grammar (CFG) and Earley’s parsing algorithm for parsing Manipuri language, achieving a Recall of 81.71%, Precision of 72.38%, and F-measure of 76.76%.
Nirmal et al.,2021 A Context-Free Grammar for Parsing Manipuri Language The study utilizes a context-free grammar (CFG) approach for parsing Manipuri sentences, achieving a recognition rate of 83.20% with an Earley’s parser.

Go to top

Machine Translation

Author & Date Paper Summary
Doren et al.,2010 Manipuri-English Example Based Machine Translation System The paper presents a Manipuri-English example-based machine translation system, utilizing parallel corpus alignment techniques including POS tagging, morphological analysis, NER, and chunking, achieving BLEU and NIST scores of 0.137 and 3.361 respectively, outperforming a baseline SMT system with the same training and test data.
Doren et al.,2010 Statistical Machine Translation of English-Manipuri using Morpho-syntactic and Semantic Information The paper introduces a factored Statistical Machine Translation (SMT) system for the English-Manipuri language pair, highlighting the significance of suffixes, dependency relations, and case markers in translation, resulting in improved translation quality, as evidenced by both BLEU score and subjective evaluation.
Doren et al.,2010 Manipuri-English Bidirectional Statistical Machine Translation Systems using Morphology and Dependency Relations The paper presents the development of bidirectional Manipuri-English statistical machine translation systems, highlighting the importance of suffixes, dependency relations, and case markers, with factored BLEU scores improved from 13.045 to 16.873 for English-Manipuri and from 13.452 to 17.573 for Manipuri-English translations, alongside subjective evaluation showing enhanced fluency and adequacy compared to baseline systems.
Doren et al.,2011 Integration of Reduplicated Multiword Expressions and Named Entities in a Phrase Based Statistical Machine Translation System The paper presents the integration of reduplicated multiword expressions (RMWEs) and Multiword Named Entities (MNEs) into a Manipuri-English Phrase Based Statistical Machine Translation (PBSMT) system, utilizing SVM-based machine learning and GIZA++ alignment techniques, resulting in improved BLEU and NIST scores over baseline systems, as well as subjective evaluation indicating enhanced adequacy
Doren et al.,2012 Addressing some Issues of Data Sparsity towards Improving English-Manipuri SMT using Morphological Information The paper explores enriching parallel corpora resources for morphologically rich languages like Manipuri, enhancing SMT system performance in terms of both automatic scoring and subjective evaluation over baseline systems through mapping from source to target side using a factored model.
Doren et al.,2013 Taste of Two Different Flavours: Which Manipuri Script Works Better for English-Manipuri Language Pair SMT Systems? The paper compares the performance of phrase-based statistical machine translation (PBSMT) systems for the English-Manipuri language pair using Bengali script and transliterated Meitei Mayek script, showing that the Bengali script-based PBSMT outperforms in terms of BLEU and NIST scores, despite slight variations in subjective evaluation against automatic scores.
Islam et al.,2017 A Review on Electronic Dictionary and Machine Translation System Developed in North-East India The paper discusses the importance and approaches of Electronic Dictionary (E-dictionary) and Machine Translation (MT) systems, highlighting their significance in Natural Language Processing (NLP), particularly in multilingual regions like North-East (NE) India, where few such systems have been developed, underscoring the growing demand for research in this area.
Michael et al.,2020 Unsupervised Neural Machine Translation for English and Manipuri The paper introduces an unsupervised neural machine translation (UNMT) system for the low-resource English-Manipuri language pair, achieving BLEU scores of 3.1 for en → mni and 2.7 for mni → en translations, with subjective evaluation yielding encouraging results on the translated output.
Meetei et al.,2020 English to Manipuri and Mizo Post-Editing Effort and its Impact on Low Resource Machine Translation The paper presents a study on post-editing effort in building a parallel dataset for English-Manipuri and English-Mizo, revealing positive correlations between technical effort and function words for both language pairs, and negative correlations between technical effort and noun words for English-Mizo, with an increase in HBLEU of up to 4.6 for English-Manipuri when using the post-edited dataset for incremental training.
Huidrom et al.,2020 Zero-shot translation among Indian languages The paper explores zero-shot translation on low-resource Indian languages, achieving an increase in translation accuracy, with a balanced data settings score multiplied by 7 for Manipuri to Hindi during Round-III of zero-shot translation.
Laitonjam et al.,2021 Manipuri-English Machine Translation using Comparable Corpus The paper explores the effectiveness of unsupervised Machine Translation (MT) models over a Manipuri-English comparable corpus, demonstrating feasibility and identifying future directions for developing effective MT for the Manipuri-English language pair under unsupervised scenarios.
Rahul et al.,2021 Statistical and Neural Machine Translation for Manipuri-English on Intelligence Domain The paper presents the development and outcomes of a Manipuri-English machine translation system in an intelligence domain, utilizing 56,678 parallel corpora from open-source intelligence (OSINT) sources, with statistical machine translation (SMT) achieving a BLEU score of 23.91 and neural machine translation (NMT) outperforming with a BLEU score of 40.67. Additionally, language-specific morphological analysis, particularly focusing on suffixes, yields further improvements, with SMT achieving a BLEU score of 25.03 and NMT achieving a BLEU score of 44.
Michael et al.,2022 Low resource machine translation of english–manipuri: A semi-supervised approach This paper introduces a semi-supervised neural machine translation system for English-Manipuri, employing self-training and back-translation techniques, yielding a +0.9 BLEU score improvement with external noise introduction, and outperforming supervised and mBART baselines by up to +4.5 and +1.2 BLEU improvements respectively.
Michael et al.,2022 An empirical study of low-resource neural machine translation of manipuri in multilingual settings This paper presents a multilingual LSTM-based neural machine translation system for Manipuri and English, incorporating cross-lingual features, which demonstrates improvement over vanilla multilingual and bilingual baselines, with enhanced performance across Manipuri-English and other Indian language-English translation tasks, including zero-shot translation evaluations.
Huidrom et al.,2022 Introducing EM-FT for Manipuri-English Neural Machine Translation This paper employs pretrained fastText word embeddings for Manipuri, enhancing machine translation experiments using neural network models, where the Transformer architecture with fastText word embedding model EM-FT consistently outperforms alternatives, while noting a negative impact on translation accuracy with additional training data from a different domain.
Devi et al.,2022 An Analysis of Phrase based SMT for English to Manipuri Language This paper presents a phrase-based Statistical Machine Translation (SMT) system from English to Manipuri, leveraging the Moses toolkit and Bengali script, and evaluates its performance using the BLEU metric on tourism, agriculture, and entertainment corpora.
Devi et al.,2023 An Exploratory Study of SMT Versus NMT for the Resource Constraint English to Manipuri Translation This study investigates and compares the performance of Statistical Machine Translation (SMT) and Neural Machine Translation (NMT) methods for English-to-Manipuri translation using BLEU, Meteor, TER, and F-measure scores as well as expert evaluation, to determine the most suitable approach for low-resource language pairs.
Singh et al.,2023 Subwords to Word Back Composition for Morphologically Rich Languages in Neural Machine Translation This paper proposes a novel approach for neural machine translation (NMT) in morphologically rich languages, segmenting words into morphemes and composing word representations from them, showing improved translation accuracy over baseline subword models in Manipuri-English, Tamil-English, and Marathi-English translation tasks, highlighting the importance of leveraging word boundary information and interrelationships between word morphemes in NMT.
Lalrempuii et al.,2023 Low-Resource Indic Languages Translation Using Multilingual Approaches This study explores the effectiveness of multilingual pre-trained transformers—mBART and mT5—on low-resource Indic languages, including Hindi, Bengali, Assamese, Manipuri, and Mizo, comparing their performance with multiway multilingual translation trained from scratch using a one-to-many and many-to-one approach, highlighting the scalability of multilingual neural machine translation (MNMT) and its potential for improving translation quality in low-resource language settings.
Pal et al.,2023 Findings of the WMT 2023 Shared Task on Low-Resource Indic Language This paper outlines the outcomes of the low-resource Indic language translation task conducted alongside the Eighth Conference on Machine Translation (WMT) 2023, where participants were tasked with developing machine translation systems for English-Assamese, English-Mizo, English-Khasi, and English-Manipuri language pairs. The evaluation of these systems will include both automatic metrics (BLEU, TER, RIBES, COMET, ChrF) and human assessment, utilizing the IndicNE-Corp1.0 dataset, comprising parallel and monolingual corpora for northeastern Indic languages like Assamese, Mizo, Khasi, and Manipuri.
Singh et al.,2023 NITS-CNLP Low-Resource Neural Machine Translation Systems of English-Manipuri Language Pair This paper presents a transformer-based Neural Machine Translation (NMT) system developed by NITS-CNLP for the English-Manipuri language pair, achieving BLEU scores of 22.75 for English to Manipuri and 26.92 for Manipuri to English translations, along with character level n-gram F-score (chrF), RIBES, TER, and COMET evaluations.
Agrawal et al.,2023 Neural Machine Translation for English - Manipuri and English - Assamese In the WMT23 shared task: low resource Indic language translation challenge, our team, ATULYA-NITS, utilized the NMT transformer model for English to/from Assamese and English to/from Manipuri language translation, achieving BLEU scores of 15.02 and 18.7 for English to Manipuri and Manipuri to English translations respectively, as well as 5.47 for English to Assamese and 8.5 for Assamese to English translations.

Go to top

Transliteration

Author & Date Paper Summary
Nongmeikapam et al.,2011 Manipuri Transliteration from Bengali Script to Meitei Mayek: A Rule Based Approach This paper presents a novel approach to transliterating Manipuri text from Bengali script to Meitei Mayek (Meitei script), utilizing a rule-based model and algorithm, achieving an impressive accuracy of 86.28%.
Doren ,2012 Bidirectional Bengali Script and Meetei Mayek Transliteration of Web Based Manipuri News Corpus This study presents a rule-based transliteration approach between Bengali and Meetei Mayek scripts for Manipuri text, emphasizing the significance of linguistic rule integration leveraging Manipuri's monosyllabic nature, achieving higher precision and recall than statistical methods for Bengali to Meetei Mayek transliteration, while statistical approaches outperform rule-based methods for the reverse transliteration.
Nongmeikapam et al.,2012 A Transliteration of CRF based Manipuri POS Tagging Part of Speech (POS) tagging is applied to Bengali Script Manipuri text using Conditional Random Field (CRF), followed by transliteration to Meitei Mayek script, a process common to languages with multiple scripts like Manipuri, which has borrowed the Bengali Script alongside its original Meitei Mayek script.
Laitonjam et al.,2022 A Hybrid Machine Transliteration Model Based on Multi-source Encoder–Decoder Framework: English to Manipuri This paper presents a neural hybrid machine transliteration model integrating grapheme and phoneme representations, enhancing traditional encoder-decoder models for multi-source framework, demonstrated through experiments on English to Manipuri transliteration task, showcasing significant performance improvement over its phoneme and grapheme counterparts.

Go to top

Sentiment Analysis

Author & Date Paper Summary
Nongmeikapam et al.,2014 Verb Based Manipuri Sentiment Analysis This paper presents a sentiment analysis approach for Manipuri articles, utilizing Part of Speech (POS) tagging with Conditional Random Field (CRF) and a manually modified lexicon of verbs for sentiment polarity determination, achieving a recall of 72.10%, precision of 78.14%, and F-measure of 75.00%.
Kaur et al.,2014 A Study and Analysis of Opinion Mining Research in Indo-Aryan, Dravidian and Tibeto-Burman Language Families This paper conducts sentiment analysis across various Indian languages, including Hindi, Bengali, Punjabi, Oriya, Urdu, Marathi, Telugu, and Manipuri, with Oriya text demonstrating superior performance, while suggesting further exploration into Punjabi sentiment analysis and comparing Indian languages' performance with English.
Meetei et al.,2021 Low resource language specific pre-processing and features for sentiment analysis task This work presents sentiment analysis for Manipuri using various machine learning approaches, incorporating language-specific preprocessing tasks and reporting improved classification results in terms of precision, recall, and F-score, particularly with ensemble voting of the top three classifiers based on TF-IDF, along with findings from deep learning-based methods.
Doren et al.,2021 Review Comments of Manipuri Online Video: Good, Bad or Ugly This paper conducts a comparative analysis of sentiment analysis methodologies, including deep learning, traditional machine learning, and lexicon-based approaches, on a resource-constrained dataset of Manipuri comments from social media platforms, emphasizing the significance of pre-processing and feature engineering.

Go to top

Speech Technologies

Author & Date Paper Summary
Patel et al.,2018 An Automatic Speech Transcription System for Manipuri Language This paper presents various methaods for language identification, speech-to-text, keyword search, and speaker diarization, integrated into a platform with a user interface for demonstration purposes.
Basu et al.,2018 Preliminary Acoustic Analysis of Manipuri Vowels The paper conducts a preliminary study on the acoustic characteristics of Manipuri vowels, analyzing a speech corpus of around 500 Phonetically Balanced Words (PBW) embedded in neutral carrier sentences spoken by 10 informants (5 male and 5 female) in the Imphal dialect, aiming to develop speech technology for this low-resourced language
Meetei et al.,2021 An Experiment on Speech-to-Text Translation Systems for Manipuri to English on Low Resource Setting This paper presents experimental findings on building Speech-to-Text translation systems for Manipuri-English, utilizing a new dataset and benchmark evaluation, comparing pipeline models with ASR and Machine translation against an end-to-end approach, with Gaussian Mixture Model-Hidden Markov Model (GMM-HMM) and Time delay neural network (TDNN) Acoustic models, where the TDNN model outperforms GMM-HMM by 2.53% WER, albeit with a slight difference of 0.1 BLEU in Speech-to-Text translation evaluation, while both pipeline models surpass the end-to-end approach by 2.6 BLEU score.
Devi et al.,2021 Vowel-Based Acoustic and Prosodic Study of Three Manipuri Dialects This paper presents a comparative analysis of the acoustic and prosodic features of three major dialects of Manipuri—Imphal, Kakching, and Sekmai—revealing significant dialectal variation through measurements of formant frequency, segment duration, energy values, and pitch values, laying the groundwork for future detailed comparative studies.
Devi et al.,2022 Verbs in the Early Speeches of Two Manipuri-Speaking Children This study investigates the development of motion verbs in two Manipuri-speaking children aged 3-5 years, focusing on their emergence and setting the stage for future research, particularly in comparison to Uziel-Karl's (2001) presentation of Hebrew motion verbs.
Singh et al.,2024 MECOS: A bilingual Manipuri–English spontaneous code-switching speech corpus for automatic speech recognition This study introduces a code-switched speech database for Manipuri–English, comprising 57 hours of annotated spontaneous speech, aiming to construct an automatic speech recognition (ASR) system, with evaluations revealing the superior performance of the pure TDNN model.

Go to top

About

My repo "Advancements in Manipuri NLP" will provide a comprehensive study of research papers regarding Natural Language Processing (NLP) applications and developments of Manipuri language.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages