PyTorch original implementation of Cross-lingual Language Model Pretraining.
-
Updated
Jul 28, 2020 - Python
PyTorch original implementation of Cross-lingual Language Model Pretraining.
An extremily simple and restricted tool/lib converting binary data into text that can be processed with unsuperwised character-level natural language processing tools/libs
A modified, secure version of BPE algorithm
ASR pytorch project
A python package to build a corpus vocabulary using the byte pair methodology and also a tokenizer to tokenize input texts based on the built vocab.
Byte-Pair Encoding (BPE) (subword-based tokenization) algorithm implementaions from scratch with python
Low resource language machine translation(az,be,tr -> en).
An educational project dedicated to text-to-image generation with neural networks. VQVAE and BPE autoencoders are used to learn the embedding of text and image respectively. A transformer-based model then is trained to predict the next token in the concatenated sequence of image and text tokens and used for generation.
This project aims to implement word-based, character-based and subword-based tokenization techniques.
Byte-Pair Encoding tokenizer for training large language models on huge datasets
Learning BPE embeddings by first learning a segmentation model and then training word2vec
Central repository with pretrained models for transfer learning, BPE subword-tokenization, mono/multilingual embeddings, and everything in between.
Natural Language EnCoder-Decoder: word, char, bpe etc
Fast bare-bones BPE for modern tokenizer training
Simple-to-use scoring function for arbitrarily tokenized texts.
Byte Pair Encoding (BPE)
Add a description, image, and links to the bpe topic page so that developers can more easily learn about it.
To associate your repository with the bpe topic, visit your repo's landing page and select "manage topics."