Subword-augmented Embedding for Cloze Reading Comprehension (COLING 2018)
-
Updated
Nov 6, 2018 - Python
Subword-augmented Embedding for Cloze Reading Comprehension (COLING 2018)
Low resource language machine translation(az,be,tr -> en).
Byte Pair Encoding (BPE)
Subword Encoding in Lattice LSTM for Chinese Word Segmentation
Explains nlp building blocks in a simple manner.
Machine Learning for Phishing Website Detection
A python package to build a corpus vocabulary using the byte pair methodology and also a tokenizer to tokenize input texts based on the built vocab.
Generating new titles for movie posters using a combination of image features and pre-trained subword embeddings
PyTorch original implementation of Cross-lingual Language Model Pretraining.
Auto summarization from BPE tokenization
Golang BPE (Bytes Pair Encoding) algorithm implementation.
An educational project dedicated to text-to-image generation with neural networks. VQVAE and BPE autoencoders are used to learn the embedding of text and image respectively. A transformer-based model then is trained to predict the next token in the concatenated sequence of image and text tokens and used for generation.
A light stemmer for MDA (Moroccan Dialect Arabic) based on BPE (Byte Pair Encoding) algorithm implemented with Typescript
Sentiment-based classification for stock article title using PhoBert
Repository for the experiments in my paper: "A Systematic Analysis of Vocabulary and BPE Settings for Optimal Fine-tuning of NMT: A Case Study of In-domain Translation "
This project aims to implement word-based, character-based and subword-based tokenization techniques.
Source crypt Gradle plugin
Add a description, image, and links to the bpe topic page so that developers can more easily learn about it.
To associate your repository with the bpe topic, visit your repo's landing page and select "manage topics."