paired kto support #1069

winglian · 2024-01-09T15:08:25Z

No description provided.

src/axolotl/core/trainer_builder.py

hamelsmu

Want to add something on the README about this?

kashif · 2024-01-09T16:08:35Z

so as mentioned, this is going to unlock the "paired"-kto loss with respect to the DPO dataset, meaning that the batch will have a prompt twice, once for the preferred and once for the unpreferred generation. This means that the KL term in the KTO will have a biased approximation, so kindly keep that in mind

kashif · 2024-01-09T16:14:52Z

you will also potentially need to pin the TRL version to after this was added >=0.7.5

winglian · 2024-01-09T16:17:15Z

you will also potentially need to pin the TRL version to after this was added

Seems like 0.7.9 was recently released 4 hours ago, so this should be sufficient?

kashif · 2024-01-09T16:17:51Z

yes

requirements.txt

Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>

teknium1 · 2024-01-09T19:39:38Z

What does a dataset for KTO look like

kashif · 2024-01-09T19:50:49Z

@teknium1 in the DPOTrainer its the same as the DPO one... however for the proper KTO trainer in TRL we went with the convention:

{
       'prompt': List[str],
        'completion': List[str],
        'label': List[bool],
}

kto support

300a6fb

winglian requested review from hamelsmu, NanoCode012 and casper-hansen January 9, 2024 15:08

kashif reviewed Jan 9, 2024

View reviewed changes

src/axolotl/core/trainer_builder.py Outdated Show resolved Hide resolved

use updated loss type

5ce9fc8

hamelsmu reviewed Jan 9, 2024

View reviewed changes

[skip ci] update README for rl options

d109406

winglian changed the title ~~kto support~~ paired kto support Jan 9, 2024

pin trl to latest release

fc0e8b8

kashif reviewed Jan 9, 2024

View reviewed changes

requirements.txt Outdated Show resolved Hide resolved

Update requirements.txt

3fc1b1b

Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>

kashif approved these changes Jan 9, 2024

View reviewed changes

winglian merged commit d7057cc into main Jan 9, 2024
6 checks passed

winglian deleted the dpo-kto branch January 23, 2024 12:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

paired kto support #1069

paired kto support #1069

winglian commented Jan 9, 2024

hamelsmu left a comment

kashif commented Jan 9, 2024 •

edited

Loading

kashif commented Jan 9, 2024 •

edited

Loading

winglian commented Jan 9, 2024

kashif commented Jan 9, 2024

teknium1 commented Jan 9, 2024

kashif commented Jan 9, 2024

paired kto support #1069

paired kto support #1069

Conversation

winglian commented Jan 9, 2024

hamelsmu left a comment

Choose a reason for hiding this comment

kashif commented Jan 9, 2024 • edited Loading

kashif commented Jan 9, 2024 • edited Loading

winglian commented Jan 9, 2024

kashif commented Jan 9, 2024

teknium1 commented Jan 9, 2024

kashif commented Jan 9, 2024

kashif commented Jan 9, 2024 •

edited

Loading

kashif commented Jan 9, 2024 •

edited

Loading