diff --git a/README.md b/README.md index a711e6a..8b23b90 100644 --- a/README.md +++ b/README.md @@ -294,6 +294,17 @@ bash examples/scripts/rlhf_quark.sh \ ``` +### [Direct Preference Optimization (DPO)](https://arxiv.org/abs/2305.18290) + +To replicate our DPO results for the AlpacaFarm evaluation suite, run + +```bash +bash examples/scripts/rlhf_quark.sh \ + \ + \ + +``` + ### OpenAI models To run the OpenAI reference models with our prompts and decoding hyperparameters, run