Python port of SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm
-
Updated
Jul 11, 2024 - Python
Python port of SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm
Python API for Kiwi
ShanNLP experimental project inspired by PythaiNLP
Thai Natural Language Processing in Python.
A Japanese tokenizer based on recurrent neural networks
A comparison tool of Japanese tokenizers
Cantonese Linguistics and NLP
A Khmer word segmentation tool built for NIPTICT (now CADT) Khmer Word Segmentation CRF model.
A PyTorch implementation of the BI-LSTM-CRF model.
Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).
NLP tools, word segmentation, sentence segmentation, New-Word-Discovery,新词发现
Pytorch-NLU,一个中文文本分类、序列标注工具包,支持中文长文本、短文本的多类、多标签分类任务,支持中文命名实体识别、词性标注、分词、抽取式文本摘要等序列标注任务。 Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of spee
AdaSeq: An All-in-One Library for Developing State-of-the-Art Sequence Understanding Models
🧩 A simple sentence tokenizer.
An abstraction layer around word splitters for python
A mini version of KhmerNLP with LSTM only
Hashformers is a framework for hashtag segmentation with Transformers and Large Language Models (LLMs).
CKIP Transformers
CKIP CoreNLP Toolkits
Quantitative and qualitative evaluation of restorations of textual features using machine learning models
Add a description, image, and links to the word-segmentation topic page so that developers can more easily learn about it.
To associate your repository with the word-segmentation topic, visit your repo's landing page and select "manage topics."