FEAT: added a ChatMessageNormalizer that formats messages in the template specified by a Hugging Face tokenizer #128

blakebullwinkel · 2024-03-28T23:38:46Z

Description

I added a chat message normalizer called ChatMessageNormalizerTokenizerTemplate, which takes a Hugging Face tokenizer as input and allows you to convert a list of chat messages into the format specified by that tokenizer. This might be the ChatML format (which @dlmgary created a custom normalizer ChatMessageNormalizerChatML for) or some other custom format that was used to train the model.

Tests and Documentation

I added tests to demonstrate that the normalizer works on three different Hugging Face tokenizers and some example usage to the chat_message.ipynb demo notebook.

…pecified by a Hugging Face tokenizer

…er init file

…ferent Hugging Face tokenizers

… hook

romanlutz

Very cool! Do you see this being useful in any existing demo notebooks? No worries if not

pyrit/chat_message_normalizer/chat_message_normalizer_tokenizer.py

…_message notebook

rlundeen2

Great work Blake! This is very poweful

blakebullwinkel added 5 commits March 28, 2024 11:37

added a ChatMessageNormalizer that formats messages in the template s…

020fea6

…pecified by a Hugging Face tokenizer

added ChatMessageNormalizerTokenizerTemplate to chat message normaliz…

3d5c583

…er init file

added tests for ChatMessageNormalizerTokenizerTemplate with three dif…

b9bf665

…ferent Hugging Face tokenizers

reran pre-commit hooks on normalizer tests

d438b2c

split long strings across multiple lines to satisfy flake8 pre-commit…

6b5e486

… hook

romanlutz approved these changes Mar 29, 2024

View reviewed changes

pyrit/chat_message_normalizer/chat_message_normalizer_tokenizer.py Show resolved Hide resolved

pyrit/chat_message_normalizer/chat_message_normalizer_tokenizer.py Show resolved Hide resolved

added example usage of ChatMessageNormalizerTokenizerTemplate to chat…

30c690b

…_message notebook

rlundeen2 approved these changes Mar 29, 2024

View reviewed changes

blakebullwinkel added 4 commits March 29, 2024 09:13

updated chat_message doc python file and ran jupytext

4ee6ffd

added docstrings to ChatMessageNormalizerTokenizerTemplate class

4b35fd0

ran pre-commit hooks on chat_message file

c740840

resolved mypy pre-commit hook and reran jupytext

1520ad9

blakebullwinkel merged commit e252f4a into Azure:main Mar 29, 2024
4 checks passed

blakebullwinkel deleted the chat-message-normalizer-tokenizer branch March 29, 2024 22:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FEAT: added a ChatMessageNormalizer that formats messages in the template specified by a Hugging Face tokenizer #128

FEAT: added a ChatMessageNormalizer that formats messages in the template specified by a Hugging Face tokenizer #128

blakebullwinkel commented Mar 28, 2024 •

edited

Loading

romanlutz left a comment

rlundeen2 left a comment

FEAT: added a ChatMessageNormalizer that formats messages in the template specified by a Hugging Face tokenizer #128

FEAT: added a ChatMessageNormalizer that formats messages in the template specified by a Hugging Face tokenizer #128

Conversation

blakebullwinkel commented Mar 28, 2024 • edited Loading

Description

Tests and Documentation

romanlutz left a comment

Choose a reason for hiding this comment

rlundeen2 left a comment

Choose a reason for hiding this comment

blakebullwinkel commented Mar 28, 2024 •

edited

Loading