sentencepiece

Here are 14 public repositories matching this topic...

himkt / konoha

🌿 An easy-to-use Japanese Text Processing tool, which makes it possible to switch tokenizers with small changes of code.

nlp natural-language-processing japanese text-processing mecab kytea sudachi sentencepiece janome

Updated May 15, 2024
Python

taishan1994 / sentencepiece_chinese_bpe

Star

使用sentencepiece中BPE训练中文词表，并在transformers中进行使用。

tokenization sentencepiece chinese-vocab

Updated Jun 24, 2023
Python

nguyenvulebinh / vietnamese-roberta

Star

A Robustly Optimized BERT Pretraining Approach for Vietnamese

natural-language-processing vietnamese pytorch transformer pretrained-models bert vietnamese-nlp fairseq roberta bert-embeddings sentencepiece

Updated Jul 25, 2024
Python

Andras7 / gpt2-pytorch

Star

Extremely simple and understandable GPT2 implementation with minor tweaks

transformers pytorch mixed-precision gpt2 sentencepiece

Updated Dec 6, 2019
Python

stephantul / piecelearn

Star

Learning BPE embeddings by first learning a segmentation model and then training word2vec

word2vec embeddings bpe wordpiece sentencepiece

Updated Dec 18, 2022
Python

to-aoki / my-pytorch-bert

Star

BERT implementation of PyTorch

nlp pytorch japanese-language albert bert sentencepiece pytorch-bert

Updated Mar 16, 2020
Python

smafjal / bengali_tokenizer

Star

Bengali language Tokenizer (SentencePiece)

tokenizer bengali unsupervised-learning sentencepiece bengali-natural-language-processing bengali-tokenizer

Updated Oct 20, 2019
Python

leliuga / datrin

Star

dataset, train, inference

inference dataset flax train jax sentencepiece safetensors

Updated May 19, 2024
Python

kgarg8 / NMT-RNN

Star

NMT with RNN Models: (1) in Vanilla style, (2) with Sentencepiece, (3) using Pre-trained models from FairSeq

machine-translation pytorch rnn fairseq sentencepiece

Updated Sep 19, 2021
Python

ZJaume / escape-unk

Star

Escape unknown symbols in SentecePiece vocabularies

natural-language-processing neural-machine-translation escaping sentencepiece

Updated Jun 25, 2024
Python

FloweryK / Sentencepiece-Pretrained-Models

Star

pretrained models and a training code for sentencepiece

pretrained sentencepiece

Updated Jul 27, 2023
Python

sunsikim / tf-spm-tokenizer-pattern

Star

Tensorflow Model Incorporable Sentencepiece Tokenizer Training Code

nlp imdb-dataset tensorflow2 sentencepiece

Updated May 21, 2023
Python

kmaurinjones / WikiGameBot

Star

Automated WikiGame-playing 'bot'. Achieved via SentenceTransformer Word Embeddings.

nlp api wikipedia transformer wordembeddings sentencepiece wikigame sentencetransformer

Updated Jan 18, 2024
Python

ReshiAdavan / Thoth

Star

An Industry Standard Tokenizer, purposed for large-scale language models like OpenAI's GPT Series.

python rust natural-language-processing tokenizer gpt-2 sentencepiece bytepairencoding gpt-4 tiktoken llama2

Updated Jun 29, 2024
Python

Improve this page

Add a description, image, and links to the sentencepiece topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the sentencepiece topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sentencepiece

Here are 14 public repositories matching this topic...

himkt / konoha

taishan1994 / sentencepiece_chinese_bpe

nguyenvulebinh / vietnamese-roberta

Andras7 / gpt2-pytorch

stephantul / piecelearn

to-aoki / my-pytorch-bert

smafjal / bengali_tokenizer

leliuga / datrin

kgarg8 / NMT-RNN

ZJaume / escape-unk

FloweryK / Sentencepiece-Pretrained-Models

sunsikim / tf-spm-tokenizer-pattern

kmaurinjones / WikiGameBot

ReshiAdavan / Thoth

Improve this page

Add this topic to your repo