Skip to content

Latest commit

 

History

History

synthetic-preference-data

Synthetic Preference Data Generation Using Nemotron-4 340B

The provided notebook will demonstrate how to leverage Llama 3.1 405B Instruct, and Nemotron-4 340B Reward through build.nvidia.com.

The build will be a demonstration of the following pipeline!

image

The pipeline is designed to create a preference dataset suitable for training a custom reward model using the SteerLM method, however consecutive responses (e.g. sample 1 with 2, 3 with 4, etc.) share the same prompt so the dataset can also be used for preference pairs for training an RLHF Reward Model or for DPO - using the helpfulness score.