#

bpe

Here are 26 public repositories matching this topic...

Yeema / UMT_zh_en

PyTorch original implementation of Cross-lingual Language Model Pretraining.

ecommerce xml bleu-score multigpu bpe unsupervised-machine-translation fastbpe

Updated Jul 28, 2020
Python

maximkm / DLA_ASR_HW

ASR pytorch project

transformers pytorch lm beam-search asr asr-model bpe

Updated Oct 16, 2022
Python

semenovi / py-tokenizers

Just another Tokemizer Collection

bpe maxmatch bpe-dropout

Updated Apr 7, 2023
Python

rsanchezmo / tokenizers

Text Tokenizers in Python

Updated May 14, 2024
Python

dlsucelt / Cellar

Central repository with pretrained models for transfer learning, BPE subword-tokenization, mono/multilingual embeddings, and everything in between.

language-modeling embeddings neural-networks transfer-learning subword-segmentation bpe

Updated Oct 15, 2022
Python

gazelle93 / Tokenization-Techniques

This project aims to implement word-based, character-based and subword-based tokenization techniques.

nlp natural-language-processing spacy nltk gensim tokenization stanza word-based bpe byte-pair-encoding character-based subword-based

Updated Apr 20, 2022
Python

KOLANICH-libs / Bin2Text.py

An extremily simple and restricted tool/lib converting binary data into text that can be processed with unsuperwised character-level natural language processing tools/libs

nlp binary unsupervised-learning bpe

Updated Oct 13, 2023
Python

yash-srivastava19 / sec_bpe

A modified, secure version of BPE algorithm

ai tokenization karpathy bpe

Updated Mar 29, 2024
Python

cxia0209 / Low-Resource-Ted-Talk

Low resource language machine translation(az,be,tr -> en).

machine-translation attention-mechanism encoder-decoder ted-data bpe back-translation

Updated Nov 10, 2018
Python

viviaxenov / text_to_image_with_transformer

An educational project dedicated to text-to-image generation with neural networks. VQVAE and BPE autoencoders are used to learn the embedding of text and image respectively. A transformer-based model then is trained to predict the next token in the concatenated sequence of image and text tokens and used for generation.

machine-learning transformer neural-networks text-to-image bpe vqvae

Updated Jun 8, 2021
Python

vatsalsaglani / BytePairEncoding

A python package to build a corpus vocabulary using the byte pair methodology and also a tokenizer to tokenize input texts based on the built vocab.

nlp natural-language-processing tokenizer vocabulary nlp-library vocabulary-builder natural-language-understanding subword-units bpe bytepairencoding subwordtokenization subwordtokens

Updated May 21, 2020
Python

teleprint-me / byte-pair

Byte Pair Encoding (BPE) Tokenization for Natural Language Processing

nlp tokenizer bpe

Updated May 6, 2024
Python

jmaczan / bpe-tokenizer

Byte-Pair Encoding tokenizer for training large language models on huge datasets

python machine-learning deep-learning tokenizer chunking from-scratch bpe byte-pair-encoding large-language-models llm bpe-tokenizer

Updated Jun 4, 2024
Python

howdymic / tiktoken-server

Docker container to expose the OpenAI tokenizer as a REST service

openai bpe tiktoken

Updated Feb 19, 2023
Python

isi-nlp / nlcodec

Natural Language EnCoder-Decoder: word, char, bpe etc

nlp text preprocessing bpe

Updated Aug 27, 2023
Python

SeonbeomKim / Python-Byte_Pair_Encoding

Byte Pair Encoding (BPE)

multiprocessing bpe byte-pair-encoding wordpiece

Updated Feb 25, 2019
Python

OctopusMind / BBPE

BBPE 底层实现

tokenizer bpe bbpe qwen llama3

Updated Apr 29, 2024
Python

DolbyUUU / byte_pair_encoding_BPE_subword_tokenization_implementation_python

Byte-Pair Encoding (BPE) (subword-based tokenization) algorithm implementaions from scratch with python

python nlp natural-language-processing tokenizer data-preprocessing data-cleaning bpe byte-pair-encoding subword-tokenization

Updated Jan 30, 2023
Python

cooelf / subMrc

Subword-augmented Embedding for Cloze Reading Comprehension (COLING 2018)

question-answering reading-comprehension subword bpe bpe-segmentation

Updated Nov 6, 2018
Python

stephantul / piecelearn

Learning BPE embeddings by first learning a segmentation model and then training word2vec

word2vec embeddings bpe wordpiece sentencepiece

Updated Dec 18, 2022
Python

Improve this page

Add a description, image, and links to the bpe topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the bpe topic, visit your repo's landing page and select "manage topics."