Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FEAT: added a ChatMessageNormalizer that formats messages in the template specified by a Hugging Face tokenizer #128

Merged

Conversation

blakebullwinkel
Copy link
Contributor

@blakebullwinkel blakebullwinkel commented Mar 28, 2024

Description

I added a chat message normalizer called ChatMessageNormalizerTokenizerTemplate, which takes a Hugging Face tokenizer as input and allows you to convert a list of chat messages into the format specified by that tokenizer. This might be the ChatML format (which @dlmgary created a custom normalizer ChatMessageNormalizerChatML for) or some other custom format that was used to train the model.

Tests and Documentation

I added tests to demonstrate that the normalizer works on three different Hugging Face tokenizers and some example usage to the chat_message.ipynb demo notebook.

Copy link
Contributor

@romanlutz romanlutz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool! Do you see this being useful in any existing demo notebooks? No worries if not

Copy link
Contributor

@rlundeen2 rlundeen2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work Blake! This is very poweful

@blakebullwinkel blakebullwinkel merged commit e252f4a into Azure:main Mar 29, 2024
4 checks passed
@blakebullwinkel blakebullwinkel deleted the chat-message-normalizer-tokenizer branch March 29, 2024 22:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants