Skip to content

Commit

Permalink
chore: finalize dpo
Browse files Browse the repository at this point in the history
  • Loading branch information
lxuechen committed Dec 2, 2023
1 parent 43333c7 commit f2484c3
Showing 1 changed file with 11 additions and 0 deletions.
11 changes: 11 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -294,6 +294,17 @@ bash examples/scripts/rlhf_quark.sh \
<kl_coef>
```

### [Direct Preference Optimization (DPO)](https://arxiv.org/abs/2305.18290)

To replicate our DPO results for the AlpacaFarm evaluation suite, run

```bash
bash examples/scripts/rlhf_quark.sh \
<your_output_dir_for_dpo> \
<your_wandb_run_name> \
<your_output_dir_for_sft10k>
```

### OpenAI models

To run the OpenAI reference models with our prompts and decoding hyperparameters, run
Expand Down

0 comments on commit f2484c3

Please sign in to comment.