Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a chat_template strategy for DPO datasets #1708

Open
5 tasks done
fozziethebeat opened this issue Jun 14, 2024 · 4 comments · May be fixed by #1725
Open
5 tasks done

Add a chat_template strategy for DPO datasets #1708

fozziethebeat opened this issue Jun 14, 2024 · 4 comments · May be fixed by #1725
Labels
enhancement New feature or request

Comments

@fozziethebeat
Copy link
Contributor

⚠️ Please check that this feature request hasn't been suggested before.

  • I searched previous Ideas in Discussions didn't find any similar feature requests.
  • I searched previous Issues didn't find any similar feature requests.

🔖 Feature description

This is basically #1660 but for DPO datasets:

  • Let users specify a convesations field
  • Let users specify role and content fields for each message
  • Let the tokenizer chat template turn the conversation messages into the input prompt
  • Let the tokenizer chat template turn the rejected and chosen field content into the right completion prompt
    Then boom, easy training on long conversation sequences.

✔️ Solution

Replicate the changes in #1660 with minor tweaks for DPO.

❓ Alternatives

This can all be done outside of axolotol with the user defined fields but it's a bit messy and risks getting things wrong with the tokenizer chat template.

📝 Additional Context

I can do the implementation

Acknowledgements

  • My issue title is concise, descriptive, and in title casing.
  • I have searched the existing issues to make sure this feature has not been requested yet.
  • I have provided enough information for the maintainers to understand and evaluate this request.
@fozziethebeat fozziethebeat added the enhancement New feature or request label Jun 14, 2024
@SicariusSicariiStuff
Copy link

Can we also have ORPO while we're at it? :)

@ehartford
Copy link
Collaborator

this will require an upstream change to trl
or, it will require automatically splitting the dataset from this:

[{prompt1, response1},{prompt2, response2}, {prompt3, response3}]

into this

{prompt1, accepted1, rejected1}
{prompt1+accepted1+prompt2, accepted2, rejected2}
{prompt1+accepted1+prompt2+accepted2+prompt3, accepted3, rejected3}
{prompt1+accepted1+prompt2+accepted2+prompt3+accepted3+prompt4, accepted4, rejected4}

@fozziethebeat
Copy link
Contributor Author

Oof yeah I think ORPO is a separate feature request. From what I can tell data format support is quite unique for each RL method.

@fozziethebeat
Copy link
Contributor Author

Also, I have a have a fork of this working I just haven't gotten around to unit testing it. I have manually inspected the tokenization and am satisfied.

@fozziethebeat fozziethebeat linked a pull request Jul 2, 2024 that will close this issue
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants