Add a `chat_template` strategy for DPO datasets #1708

fozziethebeat · 2024-06-14T02:16:09Z

⚠️ Please check that this feature request hasn't been suggested before.

I searched previous Ideas in Discussions didn't find any similar feature requests.
I searched previous Issues didn't find any similar feature requests.

🔖 Feature description

This is basically #1660 but for DPO datasets:

Let users specify a convesations field
Let users specify role and content fields for each message
Let the tokenizer chat template turn the conversation messages into the input prompt
Let the tokenizer chat template turn the rejected and chosen field content into the right completion prompt
Then boom, easy training on long conversation sequences.

✔️ Solution

Replicate the changes in #1660 with minor tweaks for DPO.

❓ Alternatives

This can all be done outside of axolotol with the user defined fields but it's a bit messy and risks getting things wrong with the tokenizer chat template.

📝 Additional Context

I can do the implementation

Acknowledgements

My issue title is concise, descriptive, and in title casing.
I have searched the existing issues to make sure this feature has not been requested yet.
I have provided enough information for the maintainers to understand and evaluate this request.

The text was updated successfully, but these errors were encountered:

SicariusSicariiStuff · 2024-07-01T12:10:42Z

Can we also have ORPO while we're at it? :)

ehartford · 2024-07-01T13:04:39Z

this will require an upstream change to trl
or, it will require automatically splitting the dataset from this:

[{prompt1, response1},{prompt2, response2}, {prompt3, response3}]

into this

{prompt1, accepted1, rejected1}
{prompt1+accepted1+prompt2, accepted2, rejected2}
{prompt1+accepted1+prompt2+accepted2+prompt3, accepted3, rejected3}
{prompt1+accepted1+prompt2+accepted2+prompt3+accepted3+prompt4, accepted4, rejected4}

fozziethebeat · 2024-07-01T15:16:05Z

Oof yeah I think ORPO is a separate feature request. From what I can tell data format support is quite unique for each RL method.

fozziethebeat · 2024-07-01T15:16:51Z

Also, I have a have a fork of this working I just haven't gotten around to unit testing it. I have manually inspected the tokenization and am satisfied.

fozziethebeat added the enhancement New feature or request label Jun 14, 2024

fozziethebeat linked a pull request Jul 2, 2024 that will close this issue

Add a chat_template prompt strategy for DPO #1725

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a `chat_template` strategy for DPO datasets #1708

Add a `chat_template` strategy for DPO datasets #1708

fozziethebeat commented Jun 14, 2024

SicariusSicariiStuff commented Jul 1, 2024

ehartford commented Jul 1, 2024

fozziethebeat commented Jul 1, 2024

fozziethebeat commented Jul 1, 2024

Add a chat_template strategy for DPO datasets #1708

Add a chat_template strategy for DPO datasets #1708

Comments

fozziethebeat commented Jun 14, 2024

⚠️ Please check that this feature request hasn't been suggested before.

🔖 Feature description

✔️ Solution

❓ Alternatives

📝 Additional Context

Acknowledgements

SicariusSicariiStuff commented Jul 1, 2024

ehartford commented Jul 1, 2024

fozziethebeat commented Jul 1, 2024

fozziethebeat commented Jul 1, 2024

Add a `chat_template` strategy for DPO datasets #1708

Add a `chat_template` strategy for DPO datasets #1708