bpe

An extremily simple and restricted tool/lib converting binary data into text that can be processed with unsuperwised character-level natural language processing tools/libs

nlp binary unsupervised-learning bpe

Updated Oct 13, 2023
Python

isi-nlp / nlcodec

Star

Natural Language EnCoder-Decoder: word, char, bpe etc

nlp text preprocessing bpe

Updated Aug 27, 2023
Python

semenovi / py-tokenizers

Star

Just another Tokemizer Collection

bpe maxmatch bpe-dropout

Updated Apr 7, 2023
Python

howdymic / tiktoken-server

Star

Docker container to expose the OpenAI tokenizer as a REST service

openai bpe tiktoken

Updated Feb 19, 2023
Python

DolbyUUU / byte_pair_encoding_BPE_subword_tokenization_implementation_python

Star

Byte-Pair Encoding (BPE) (subword-based tokenization) algorithm implementaions from scratch with python

python nlp natural-language-processing tokenizer data-preprocessing data-cleaning bpe byte-pair-encoding subword-tokenization

Updated Jan 30, 2023
Python

stephantul / piecelearn

Star

Learning BPE embeddings by first learning a segmentation model and then training word2vec

word2vec embeddings bpe wordpiece sentencepiece

Updated Dec 18, 2022
Python

maximkm / DLA_ASR_HW

Star

ASR pytorch project

transformers pytorch lm beam-search asr asr-model bpe

Updated Oct 16, 2022
Python

dlsucelt / Cellar

Star

Central repository with pretrained models for transfer learning, BPE subword-tokenization, mono/multilingual embeddings, and everything in between.

language-modeling embeddings neural-networks transfer-learning subword-segmentation bpe

Updated Oct 15, 2022
Python

soaxelbrooke / python-bpe

Star

Byte Pair Encoding for Python!

python nlp bpe

Updated Sep 16, 2022
Python

gazelle93 / Tokenization-Techniques

Star

This project aims to implement word-based, character-based and subword-based tokenization techniques.

nlp natural-language-processing spacy nltk gensim tokenization stanza word-based bpe byte-pair-encoding character-based subword-based

Updated Apr 20, 2022
Python

viviaxenov / text_to_image_with_transformer

Star

An educational project dedicated to text-to-image generation with neural networks. VQVAE and BPE autoencoders are used to learn the embedding of text and image respectively. A transformer-based model then is trained to predict the next token in the concatenated sequence of image and text tokens and used for generation.

machine-learning transformer neural-networks text-to-image bpe vqvae

Updated Jun 8, 2021
Python

Improve this page

Add a description, image, and links to the bpe topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the bpe topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bpe

Here are 26 public repositories matching this topic...

rsennrich / subword-nmt

akretion / nfelib

zouharvi / tokenization-scorer

jmaczan / bpe-tokenizer

rsanchezmo / tokenizers

teleprint-me / byte-pair

OctopusMind / BBPE

yash-srivastava19 / sec_bpe

gautierdag / bpeasy

KOLANICH-libs / Bin2Text.py

isi-nlp / nlcodec

semenovi / py-tokenizers

howdymic / tiktoken-server

DolbyUUU / byte_pair_encoding_BPE_subword_tokenization_implementation_python

stephantul / piecelearn

maximkm / DLA_ASR_HW

dlsucelt / Cellar

soaxelbrooke / python-bpe

gazelle93 / Tokenization-Techniques

viviaxenov / text_to_image_with_transformer

Improve this page

Add this topic to your repo