bpe

Star

Here are 26 public repositories matching this topic...

Yeema / UMT_zh_en

Star

PyTorch original implementation of Cross-lingual Language Model Pretraining.

ecommerce xml bleu-score multigpu bpe unsupervised-machine-translation fastbpe

Updated Jul 28, 2020
Python

KOLANICH-libs / Bin2Text.py

Star

An extremily simple and restricted tool/lib converting binary data into text that can be processed with unsuperwised character-level natural language processing tools/libs

nlp binary unsupervised-learning bpe

Updated Oct 13, 2023
Python

yash-srivastava19 / sec_bpe

Sponsor

Star

A modified, secure version of BPE algorithm

ai tokenization karpathy bpe

Updated Mar 29, 2024
Python

maximkm / DLA_ASR_HW

Star

ASR pytorch project

transformers pytorch lm beam-search asr asr-model bpe

Updated Oct 16, 2022
Python

vatsalsaglani / BytePairEncoding

Star

A python package to build a corpus vocabulary using the byte pair methodology and also a tokenizer to tokenize input texts based on the built vocab.

nlp natural-language-processing tokenizer vocabulary nlp-library vocabulary-builder natural-language-understanding subword-units bpe bytepairencoding subwordtokenization subwordtokens

Updated May 21, 2020
Python

OctopusMind / BBPE

Star

BBPE 底层实现

tokenizer bpe bbpe qwen llama3

Updated Apr 29, 2024
Python

DolbyUUU / byte_pair_encoding_BPE_subword_tokenization_implementation_python

Star

Byte-Pair Encoding (BPE) (subword-based tokenization) algorithm implementaions from scratch with python

python nlp natural-language-processing tokenizer data-preprocessing data-cleaning bpe byte-pair-encoding subword-tokenization

Updated Jan 30, 2023
Python

rsanchezmo / tokenizers

Star

Text Tokenizers in Python

tokenizer bpe

Updated May 14, 2024
Python

cxia0209 / Low-Resource-Ted-Talk

Star

Low resource language machine translation(az,be,tr -> en).

machine-translation attention-mechanism encoder-decoder ted-data bpe back-translation

Updated Nov 10, 2018
Python

viviaxenov / text_to_image_with_transformer

Star

An educational project dedicated to text-to-image generation with neural networks. VQVAE and BPE autoencoders are used to learn the embedding of text and image respectively. A transformer-based model then is trained to predict the next token in the concatenated sequence of image and text tokens and used for generation.

machine-learning transformer neural-networks text-to-image bpe vqvae