byte-pair-encoding

Here are 29 public repositories matching this topic...

tizianocitro / tiztoken

A byte-level Byte Pair Encoding (BPE) algorithm for tokenization in large language models (LLMs), similar to those used in GPT, Llama, and Mistral.

python tokenization byte-pair-encoding llms

Updated Aug 7, 2024

This is project for sequence to sequence NLP task. We developed a custom model to understand the process of task using PyTorch. We also fine tuned pre-trained transformer models to improve the performance of translation task.

transformer tokenization fine-tuning byte-pair-encoding streamlit natual-language-processing

Updated Aug 1, 2024
Jupyter Notebook

psycoplankton / GPT-Decoded

Star

An implementation of the GPT(generative pretrained transformer) model, from scratch, which produces Shakespearean text by training on the dialogues written by Shakespeare along with the GPT Encoder.

natural-language-processing generative-model gpt implementation encodings language-model attention-mechanism byte-pair-encoding transformers-models

Updated Jul 30, 2024
Jupyter Notebook

jonasknobloch / mbpe

Star

Morphologically biased byte-pair encoding

nlp morphology tokenizer segmentation morphological-analysis byte-pair-encoding

Updated Jul 24, 2024
Rust

mdabir1203 / BPE_Tokenizer_Visualizer

Star

A Visualizer to check how BPE Tokenizer in an LLM Works

react javascript ai byte-pair-encoding tokenizer-nlp 42wolfsburg llm ai-club

Updated Jul 5, 2024
JavaScript

Ascend-Research / AutoGO

Star

Code repo for the paper "AutoGO: Automated Computation Graph Optimization for Neural Network Evolution", accepted to NeurIPS 2023.

deep-neural-networks deep-learning tensorflow tokenizer pytorch classification human-pose-estimation super-resolution semantic-segmentation neural-architecture-search pytorch-geometric byte-pair-encoding neurips-2023

Updated Jun 7, 2024
Python

aallam / ktoken

Star

Kotlin multiplatform BPE tokenizer library for OpenAI models

kotlin tokenizer openai gpt bpe byte-pair-encoding tiktoken binary-p

Updated Jun 7, 2024
Kotlin

cosmaadrian / acumen-compressor

Star

Order-agnostic lossless compressor using BPE and Huffman Coding.

compression huffman-coding lossless byte-pair-encoding

Updated Jun 6, 2024
Python

jmaczan / bpe-tokenizer

Star

Byte-Pair Encoding tokenizer for training large language models on huge datasets

python machine-learning deep-learning tokenizer chunking from-scratch bpe byte-pair-encoding large-language-models llm bpe-tokenizer

Updated Jun 4, 2024
Python

BobMcDear / minbpe-hs

Star

Byte-level byte pair encoding (BPE) in Haskell

haskell tokenizer bpe byte-pair-encoding llm

Updated May 27, 2024
Haskell

rashmishreev / Deep-Learning

Star

This repository houses my assignments completed during the Deep Learning course as part of my Master's in Data Analytics program. Explore diverse projects showcasing hands-on applications of advanced neural networks and machine learning techniques.

machine-learning computer-vision deep-learning sentiment-analysis word2vec text-generation cnn mnist neural-networks resnet transfer-learning ann vgg19 glove-embeddings byte-pair-encoding keras-embeddings

Updated Mar 16, 2024
Jupyter Notebook

clabrugere / byte-pair-encoding

Star

Byte pair encoding tokenizer as used in some large language models.

tokenizer byte-pair-encoding

Updated Feb 25, 2024
Python

capjamesg / bpe

Star

Byte-pair encoding implementation in Python.

text-encoding byte-pair-encoding

Updated Dec 27, 2023
Python

ankane / youtokentome-ruby

Star

High performance unsupervised text tokenization for Ruby

unsupervised-learning word-segmentation tokenization npl bpe byte-pair-encoding

Updated Dec 27, 2023
Ruby

Panjete / iidsearch

Star

an efficient ranked retrieval system for English corpora, optimised with VBE and BPE.

inverted-index trec variable-byte-encoding byte-pair-encoding

Updated Nov 10, 2023
Python

parsa-abbasi / intro-to-nlp

Star

An Introduction to Natural Language Processing (NLP)

nlp text-classification word2vec tokenizer naive-bayes bag-of-words tf-idf text-processing tokenization stemming lemmatization byte-pair-encoding one-hot-encoding

Updated Nov 4, 2023
Jupyter Notebook

samber / go-gpt-3-encoder

Sponsor

Star

Go BPE tokenizer (Encoder+Decoder) for GPT2 and GPT3

go encoder decoder tokenizer transformer openai token codex bpe byte-pair-encoding gpt-2 gpt-3

Updated Nov 1, 2023
Go

theskyinflames / word2png

Star

This is a tool that encrypts a sequence of words (or pieces of texts) using the AES-256 algorithm and encodes the encrypted result into a PNG image by linking each byte value to a specific color. It also decodes the before image to get back the original sequence of words

Updated Sep 23, 2023
Go

bnosac / tokenizers.bpe

Star

R package for Byte Pair Encoding based on YouTokenToMe

text-mining tokenization bpe byte-pair-encoding

Updated Sep 16, 2023
C++

AndreiMoraru123 / Neural-Machine-Translation

Star

Modern Eager TensorFlow implementation of Attention Is All You Need

Updated Aug 6, 2023
Python

Improve this page

Add a description, image, and links to the byte-pair-encoding topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the byte-pair-encoding topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

byte-pair-encoding

Here are 29 public repositories matching this topic...

tizianocitro / tiztoken

Maria-Antony / Seq2Seq-NMT

psycoplankton / GPT-Decoded

jonasknobloch / mbpe

mdabir1203 / BPE_Tokenizer_Visualizer

Ascend-Research / AutoGO

aallam / ktoken

cosmaadrian / acumen-compressor

jmaczan / bpe-tokenizer

BobMcDear / minbpe-hs

rashmishreev / Deep-Learning

clabrugere / byte-pair-encoding

capjamesg / bpe

ankane / youtokentome-ruby

Panjete / iidsearch

parsa-abbasi / intro-to-nlp

samber / go-gpt-3-encoder

theskyinflames / word2png

bnosac / tokenizers.bpe

AndreiMoraru123 / Neural-Machine-Translation

Improve this page

Add this topic to your repo