Skip to content

Language detection tool based on fastText pretrained model.

License

Notifications You must be signed in to change notification settings

dkajtoch/fast-lang

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

fast-lang

Language detection tool based on fastText pretrained model.

Text preprocessing

Numbers, punctuation and repeating whitespaces are removed before feeding into language detector tool.

Examples

from fastlang import FastLangDetect

detector = FastLangDetect()

detector.detect('Where is my mother?') 
# {'en': 0.996435284614563}

detector.detect('Where is my mother?', k=3)
# {'en': 0.996435284614563, 'th': 0.0005820714286528528, 'bn': 0.0005180443404242396}

As the examples demonstrates you can specify how many labels to return with associated probabilities. Output can also be controlled by the threshold parameter which filters result based on probability value.

detector.detect('Where is my mother?', k=3, threshold=0.5)
# {'en': 0.996435284614563}

Labels are ISO 639-1 encoded. If you want to check what is the corresponding language use iso_codes

from fastlang import iso_codes

iso_codes['en']
# 'English'

Language detector also works with lists of strings.

from fastlang import FastLangDetect

detector = FastLangDetect()

detector.detect(['Where is my mother?', 'pies i kot na drodze.'])
# [{'en': 0.996435284614563}, {'sl': 0.6256219148635864}] 

All 176 model lables can be exposed via get_labels() method.

detector.get_labels()

If you want associated frequencies just pass include_freq=True to the get_labels method.

Installation

pip install .

References

  • A. Joulin, E. Grave, P. Bojanowski, T. Mikolov, Bag of Tricks for Efficient Text Classification
  • A. Joulin, E. Grave, P. Bojanowski, M. Douze, H. Jégou, T. Mikolov, FastText.zip: Compressing text classification models

About

Language detection tool based on fastText pretrained model.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages