Skip to content

Nepali Stemmer for Natural Language Processing, Machine Learning , Deep Text Learning, Artificial Intelligence

Notifications You must be signed in to change notification settings

sanjaalcorps/NepaliStemmer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 

Repository files navigation

NepaliStemmer

Nepali Stemmer based on Suffix-stripping algorithm for Natural Language Processing, Machine Learning and more. This is a part of the Nepali Natural Language project that I am currently developing in private (Bitbucket). This portion is released for public use, review and improvement. I will be releasing other useful components slowly, or to someone who is willing to volunteer to my project.

@author Kushal Paudyal
www.sanjaal.com | www.inepal.org | www.icodejava.com

Disclaimer: Not a final solution. Use it at your own risk.

Developers - Feel free to fork it and send me pull requests with your improvements.

Found issues? Report it via Issues Tab.

Usage 1 - Getting root words

String root = NepaliStemmer.getNepaliRootWord(someCompoundWord);

This stemmer is based suffix-stripping. It strips off the compound word forming text from the word, giving a potential root word (which is not the same as base word). I have categoriezed the suffixes into multiple files. -> WordEndings (e.g. स्थानलगायत where लगायत is the WordEnding), -> Name Endings (e.g. रामकुमार where कुमार is the Name Ending), -> Place Endings , -> Actual Suffixes

Prefixes have not been integrated yet.

Usage 2 - Getting Affirmative/Positive Verb Variations

String output = NepaliStemmer.getAffirmativeVerbVariations("अँगाल्नु").toString();

This will result in a list of variations of that word.

[अँगाल, अँगाल्नु, अँगाल्यो, अँगाल्यौ, अँगालेँ, अँगालेको, अँगालेछ, अँगाले, अँगालिन, अँगालिस, अँगाली, अँगालि, अँगालिछे, अँगालुन्जेल, अँगालुञ्जेल, अँगाल्नोस, अँगाल्नुस, अँगाल्नुहोस, अँगाल्नेछु, अँगाल्नुहुनेछ, अँगाल्नेछन, अँगाल्न्छन, अँगाल्न्छिन, अँगाल्न्छु, अँगाल्न्छे, अँगाल्न्छ, अँगाल्नेछौ, अँगाल्नेछिन, अँगाल्नेछ, अँगाल्नुभयो]

Usage 2 - Getting Affirmative/Positive Verb Variations

This is work in progress. The idea is to produce a negative verb variations such as नअँगाल, नअँगाल्नु from the word "अँगाल्नु";

String output = NepaliStemmer.getNegativeVerbVariations("अँगाल्नु").toString();

Need to add more to the list of suffixes?

You can do so by adding them to one of the following files.

src/main/java/org/inepal/products/nlp/compounds/CompoundWordEnding.java src/main/java/org/inepal/products/nlp/compounds/CompoundWordEndingPeopleName.java src/main/java/org/inepal/products/nlp/compounds/CompoundWordEndingPlaces.java src/main/java/org/inepal/products/nlp/compounds/NepaliSuffixes.java src/main/java/org/inepal/products/nlp/compounds/NepaliPrefixes.java (NOT INTEGRATED YET)

How to contact me?

If you have any questions or feedback, you can contact me via my LinkedIn. https://www.linkedin.com/in/kushalp/

About

Nepali Stemmer for Natural Language Processing, Machine Learning , Deep Text Learning, Artificial Intelligence

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages