Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PyThaiNLP 2.0 #180

Merged
merged 358 commits into from
Mar 31, 2019
Merged

PyThaiNLP 2.0 #180

merged 358 commits into from
Mar 31, 2019

Conversation

wannaphong
Copy link
Member

@wannaphong wannaphong commented Mar 31, 2019

PyThaiNLP 2.0

Codacy Badgepypi
Build Status
Build status
Coverage Status
License

PyThaiNLP is a Python library for natural language processing (NLP) of Thai language.

PyThaiNLP includes Thai word tokenizers, transliterators, soundex converters, part-of-speech taggers, and spell checkers.

📖 For details on upgrading from PyThaiNLP 1.7 to PyThaiNLP 2.0, see From PyThaiNLP 1.7 to PyThaiNLP 2.0

📖 For ThaiNER user after upgrading from PyThaiNLP 1.7 to PyThaiNLP 2.0, see Upgrade ThaiNER from PyThaiNLP 1.7 to PyThaiNLP 2.0

📫 follow us on Facebook Pythainlp

What's new in version 2.0 ?

  • New NorvigSpellChecker spell checker class, which can be initialized with custom dictionary.
  • Terminate Python 2 support. Remove all Python 2 compatibility code.
  • Remove old, obsolated, deprecated, and experimental code.
  • Thai2fit (Upgrade ULMFiT-related codes to fastai 1.0)
  • ThaiNER 1.0
  • Remove sentiment analysis
  • Improved word_tokenize (newmm, mm) and dict_word_tokenize
  • Improved POS-tagging
  • More and improved examples
  • see PyThaiNLP 2.0 change log

Links

bact and others added 30 commits November 2, 2018 14:30
…wercase), as suggested by @wannaphongcom

- move them from pythainlp.corpus module to to pythainlp module since they are not really a corpus and are common variables to be shared by all modules
…/__init__.py

- reduce numbers of convenience imports in pythainlp/__init__.py to reduce namespace crashes/mutual top-level import crashes possibility
- Move isthai() function from pythainlp.tokenize to pythainlp.util
- Move wordtonum function from pythainlp.util to pythainlp.number
- Refactor codes related to pythainlp.util
- More test cases, sort test cases by import order
 Consistent naming and consolidate similar codes
Merge from PyThaiNLP/pythainlp
- TTC should read ttc_freq.txt (was tnc_freq.txt)
- test case for bahttext for full number without satang
- test case for pythainlp.corpus.remove
@wannaphong wannaphong added this to the 2.0 milestone Mar 31, 2019
@coveralls
Copy link

coveralls commented Mar 31, 2019

Coverage Status

Coverage increased (+28.7%) to 81.731% when pulling 4094632 on dev into ab79eab on master.

@wannaphong wannaphong merged commit a6a7717 into master Mar 31, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants