Add a `chat_template` prompt strategy for DPO #1725

fozziethebeat · 2024-07-02T19:47:39Z

Description

Replicates the chat_template support from SFT datasets but for DPO training. Users can now specify a dataset with a list of conversation messages along with rejected and chosen columns having a single conversation message. Further, all fields can be customized.

Motivation and Context

This change provides a more configurable set of datasets for DPO training.
Fixes #1708

How has this been tested?

Unittest added for the new strategy
Manual preprocessing run over a sample dataset
Full training completed on a real dataset

Screenshots (if appropriate)

Types of changes

Code changes to prompt strategies
Unittests

Social Handles (Optional)

@fozziethebeat

This mimics the sft chat_template strategy such that users can: * Specify the messages field * Specify the per message role and content fields * speicfy the chosen and rejected fields * Let the tokenizer construct the raw prompt * Ensure the chosen and rejected fields don't have any prefix tokens

fozziethebeat · 2024-07-02T19:48:40Z

src/axolotl/utils/tokenization.py

@@ -62,7 +62,7 @@ def process_tokens_for_rl_debug(tokens, color, tokenizer, text_only):
    """Helper function to process and color tokens."""
    colored_tokens = [
        color_token_for_rl_debug(tokenizer.decode(token), token, color, text_only)
-        for token in tokenizer.encode(tokens)
+        for token in tokenizer.encode(tokens, add_special_tokens=False)


Note: I added this since by default I saw that this step was including the bos token all the time. Since that's already included it seemed reasonable to not add it in a second time.

winglian · 2024-07-05T13:20:13Z

tests/prompt_strategies/test_dpo_chat_templates.py

+    return tokenizer
+
+
+class TestAssistantChatTemplateLlama3:


Suggested change

class TestAssistantChatTemplateLlama3:

class TestAssistantDPOChatTemplateLlama3:

winglian · 2024-07-05T13:23:17Z

@fozziethebeat but for DPO training, since trl handles the tokenization, do we need this piece?

fozziethebeat · 2024-07-05T15:55:51Z

@fozziethebeat but for DPO training, since trl handles the tokenization, do we need this piece?

Was this in reference to the change in the debugging output? If so, it's not required but I think anyone manually inspecting tokenization output (like i did) would be very surprised to see the bos token duplicated in numerous scenarios. So it's more to give confidence that we constructed the strings correctly.

fozziethebeat · 2024-07-11T23:30:25Z

Any other changes to add before updating the branch and approving for merging?

fozziethebeat added 3 commits June 15, 2024 07:21

Merge branch 'OpenAccess-AI-Collective:main' into main

17f4117

Adding additional dpo chat template unittests

61000d5

fozziethebeat commented Jul 2, 2024

View reviewed changes

winglian reviewed Jul 5, 2024

View reviewed changes

Rename test class

6654826

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a `chat_template` prompt strategy for DPO #1725

Add a `chat_template` prompt strategy for DPO #1725

fozziethebeat commented Jul 2, 2024

fozziethebeat Jul 2, 2024

winglian Jul 5, 2024

fozziethebeat Jul 5, 2024

winglian commented Jul 5, 2024

fozziethebeat commented Jul 5, 2024

fozziethebeat commented Jul 11, 2024

	class TestAssistantChatTemplateLlama3:
	class TestAssistantDPOChatTemplateLlama3:

Add a chat_template prompt strategy for DPO #1725

Are you sure you want to change the base?

Add a chat_template prompt strategy for DPO #1725

Conversation

fozziethebeat commented Jul 2, 2024

Description

Motivation and Context

How has this been tested?

Screenshots (if appropriate)

Types of changes

Social Handles (Optional)

fozziethebeat Jul 2, 2024

Choose a reason for hiding this comment

winglian Jul 5, 2024

Choose a reason for hiding this comment

fozziethebeat Jul 5, 2024

Choose a reason for hiding this comment

winglian commented Jul 5, 2024

fozziethebeat commented Jul 5, 2024

fozziethebeat commented Jul 11, 2024

Add a `chat_template` prompt strategy for DPO #1725

Add a `chat_template` prompt strategy for DPO #1725