Unsupervised Word Segmentation for Neural Machine Translation and Text Generation
-
Updated
Apr 30, 2024 - Python
Unsupervised Word Segmentation for Neural Machine Translation and Text Generation
Unsupervised text tokenizer focused on computational efficiency
Fast and customizable text tokenization library with BPE and SentencePiece support
Explains nlp building blocks in a simple manner.
Machine Learning for Phishing Website Detection
Subword Encoding in Lattice LSTM for Chinese Word Segmentation
Subword-augmented Embedding for Cloze Reading Comprehension (COLING 2018)
BPE tokenizer used for Dart/Flutter applications when calling ChatGPT APIs
GPT3 encoder & decoder tool written in Swift
Sentiment-based classification for stock article title using PhoBert
Byte Pair Encoding (BPE)
Simple-to-use scoring function for arbitrarily tokenized texts.
Natural Language EnCoder-Decoder: word, char, bpe etc
Fast bare-bones BPE for modern tokenizer training
Add a description, image, and links to the bpe topic page so that developers can more easily learn about it.
To associate your repository with the bpe topic, visit your repo's landing page and select "manage topics."