DPO Prompt Strategies only support single-turn and will fail silently on multi-turn datasets #1645
Open
6 of 8 tasks
Labels
bug
Something isn't working
Please check that this issue hasn't been reported before.
Expected Behavior
I would expect the DPO prompting strategies to support multi-turn conversations for datasets such as https://huggingface.co/datasets/argilla/distilabel-capybara-dpo-7k-binarized. If it is not supported, it should at least warn the user.
Current behaviour
Currently the llama3 prompt strategy explicitly takes only a single-turn conversation. And due to the indexing, it wouldn't error out or fail if the conversation were longer:
Steps to reproduce
See that the chosen and rejected are now equal because the first turn of a longer conversation is used.
Config yaml
No response
Possible solution
A possible solution would be to explicitly handle multiturn conversations:
Which Operating Systems are you using?
Python Version
3.10
axolotl branch-commit
main/22ae21a
Acknowledgements
The text was updated successfully, but these errors were encountered: