Rejection sampling clean #218

abukharin3 · 2024-06-24T17:29:26Z

What does this PR do ?

Adds the rejection sampling algorithm.

Changelog

Please update the CHANGELOG.md under next version with high level changes in this PR.

Usage

read -r -d '' cmd_ppo <<EOF
wandb login ${WANDB_API_KEY}
&& cd ${NEMO_RLHF_DIR}
&& export PYTHONPATH="${NEMO_RLHF_DIR}:${PYTHONPATH}"
&& export HYDRA_FULL_ERROR=1
&& export CUDA_LAUNCH_BLOCKING=1
&& export PYTRITON_HOME=/pytriton_cache
&& export PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512
&& python -u examples/nlp/gpt/train_gpt_rs_actor.py
--config-path=${CONF_DIR}
--config-name=${CONFIG_NAME}
"model.data.data_prefix={train: [${TRAIN_DATA_PATH}], validation: [${VALID_DATA_PATH}], test: [${VALID_DATA_PATH}]}"
pretrained_checkpoint.restore_from_path="${ACTOR_NEMO_FILE}"
exp_manager.checkpoint_callback_params.save_top_k=1
exp_manager.explicit_log_dir="${ACTOR_LOG_DIR}"
exp_manager.create_wandb_logger=True
exp_manager.wandb_logger_kwargs.name="${ACTOR_NAME}"
exp_manager.wandb_logger_kwargs.project=${WANDB_PROJECT}
++exp_manager.max_time_per_run="00:03:30:00"
trainer.rs.max_epochs=1
trainer.rs.max_steps=313
trainer.rs.val_check_interval=4
trainer.num_nodes=8
trainer.devices=8
++model.tensor_model_parallel_size=4
model.global_batch_size=${ACTOR_GBS}
model.micro_batch_size=1
model.optim.lr="\${multiply:${ACTOR_LR},1.001}"
model.optim.sched.warmup_steps=0
model.optim.sched.constant_steps=312
model.optim.sched.min_lr=${ACTOR_LR}
model.optim.weight_decay=0.01
model.rs.num_rollout_samples=${NUM_ROLLOUTS}
model.rs.rollout_micro_batch_size=8
model.rs.forward_micro_batch_size=8
model.rs.val_rollout_micro_batch_size=8
model.data.data_impl=jsonl
remote_critic_rm.reward_model.ip=${host_critic}
remote_critic_rm.reward_model.port=${CRITIC_PORT}
model.rs.num_rollout_per_prompt=4
model.rs.num_select=1
EOF

Before your PR is "Ready for review"

Pre checks:

[Y] Make sure you read and followed Contributor guidelines
[N] Did you write any new necessary tests?
[N] Did you add or update any necessary documentation? Make sure to also update the NeMo Framework User Guide which contains the tutorials

Checklist when contributing a new algorithm

[Y] Does the trainer resume and restore model state all states?
[Y] Does the trainer support all parallelism techniques(PP, TP, DP)?
[Y] Does the trainer support max_steps=-1 and validation?
[Y] Does the trainer only call APIs defined in alignable_interface.py?
[Y] Does the trainer have proper logging?

Additional Information

Related to # (issue)

CHANGELOG.md

abukharin3

Olivier's Review

CHANGELOG.md

docs/user-guide/index.rst

docs/user-guide/rs.rst

nemo_aligner/models/nlp/gpt/reward_critic_clients.py

nemo_aligner/utils/ppo_utils.py

nemo_aligner/utils/train_script_utils.py

terrykong · 2024-09-16T00:45:52Z

docs/user-guide/rs.rst

+
+   RESULTS_DIR="critic_results_dir"
+
+   export PYTHONPATH="${NEMO_RLHF_DIR}:${PYTHONPATH}" \


Should this say

Suggested change

export PYTHONPATH="${NEMO_RLHF_DIR}:${PYTHONPATH}" \

export PYTHONPATH="${GPFS}:${PYTHONPATH}" \

?

Yes it should.

Once all comments are addressed please make sure to run the tutorial scripts to check that they work.

docs/user-guide/rs.rst

terrykong · 2024-09-16T05:40:32Z

docs/user-guide/rs.rst

+      --config-path=${CONF_DIR} \
+      --config-name=${CONFIG_NAME} \


We should drop these so the example works OOTB

docs/user-guide/rs.rst

examples/nlp/gpt/train_gpt_rs_actor.py

nemo_aligner/utils/train_script_utils.py

gshennvm · 2024-09-17T17:30:42Z

nemo_aligner/utils/train_script_utils.py

+
+    '''
+
+    if num_rollout_samples % rollout_micro_batch_size != 0:


for this function generally -> can you use https://github.com/NVIDIA/Megatron-LM/blob/72008a0460000360ae4542b5411f25175d899b2e/megatron/core/utils.py#L34?

Changed this function to use divide.

I believe this function is still needed, as there are two cases to consider when calculating the mbs.

hmm, sorry maybe i misunderstand -- why not just stack everything like you're doing in RStrainer and then try our best to cut the batch down to rollout mbs? if it's not divisible cleanly we can do a min

I removed compute_mbs and stack everything in RStrainer like mentioned.

nemo_aligner/utils/train_script_utils.py

Signed-off-by: Chris Alexiuk <chris@alexiuk.ca> Co-authored-by: Chris Alexiuk <chris@alexiuk.ca> Signed-off-by: abukharin <abukharin@nvidia.com> Signed-off-by: Alexander Bukharin <abukharin@abukharin-mlt.client.nvidia.com>

Signed-off-by: Gerald Shen <geshen@nvidia.com> Signed-off-by: abukharin <abukharin@nvidia.com> Signed-off-by: Alexander Bukharin <abukharin@abukharin-mlt.client.nvidia.com>

Signed-off-by: abukharin <abukharin@nvidia.com> Signed-off-by: Alexander Bukharin <abukharin@abukharin-mlt.client.nvidia.com>

for more information, see https://pre-commit.ci Signed-off-by: abukharin <abukharin@nvidia.com> Signed-off-by: Alexander Bukharin <abukharin@abukharin-mlt.client.nvidia.com>

Signed-off-by: abukharin <abukharin@nvidia.com> Signed-off-by: Alexander Bukharin <abukharin@abukharin-mlt.client.nvidia.com>

Signed-off-by: Alexander Bukharin <abukharin@abukharin-mlt.client.nvidia.com>

for more information, see https://pre-commit.ci

odelalleau

Did a pass through previous comments + some new ones

docs/user-guide/index.rst

docs/README.md

docs/user-guide/rs.rst

examples/nlp/gpt/conf/gpt_rs_actor.yaml

odelalleau · 2024-09-18T13:30:22Z

nemo_aligner/algorithms/rs.py

+    def load_state_dict(self, state_dict):
+        self.step = state_dict["step"]
+        self.consumed_samples = state_dict["consumed_samples"]
+        self.rs_optimization_step = state_dict["ppo_optimization_step"]  # Due to way we save checkpoint


Any update on this?

odelalleau · 2024-09-18T13:39:07Z

nemo_aligner/algorithms/rs.py

+                        # Need to pad response tokens before concatenating. Response tokens has prompts concatenated with responses.
+                        current_batch["response_tokens"], rollout_batch["response_tokens"] = pad_batch(current_batch["response_tokens"], rollout_batch["response_tokens"], self.model.tokenizer.eos_id)
+
+                        current_batch["response_tokens"] = torch.concatenate([current_batch["response_tokens"], rollout_batch["response_tokens"]], dim=0)


Any thoughts on this? (not a huge deal but seems a bit cleaner unless I'm missing something)

nemo_aligner/algorithms/rs.py

nemo_aligner/utils/train_script_utils.py

odelalleau · 2024-09-18T14:01:08Z

docs/user-guide/rs.rst

+
+   RESULTS_DIR="critic_results_dir"
+
+   export PYTHONPATH="${NEMO_RLHF_DIR}:${PYTHONPATH}" \


Once all comments are addressed please make sure to run the tutorial scripts to check that they work.

README.md

Co-authored-by: Olivier Delalleau <507137+odelalleau@users.noreply.github.com>

docs/user-guide/rs.rst

odelalleau · 2024-09-20T03:41:21Z

examples/nlp/gpt/train_gpt_rs_actor.py

+        logger=logger,
+        ckpt_callback=ckpt_callback,
+        run_timer=timer,
+        num_rollout_per_prompt=cfg.model.rs.num_rollout_per_prompt,


Might just be because it's still WIP on your side, but just to be sure it's not overlooked: you renamed the config options (num_rollout_per_prompt and top_n_rollouts) as per my suggestion, but haven't updated the code + doc accordingly yet.

Yes it was WIP, the code and doc are updated now.

for more information, see https://pre-commit.ci

github-actions bot added documentation Improvements or additions to documentation Utils Algorithms labels Jun 24, 2024

abukharin3 force-pushed the rejection_sampling_clean branch from ed54e24 to 72bdafb Compare July 13, 2024 19:51

abukharin3 commented Sep 5, 2024

View reviewed changes

CHANGELOG.md Outdated Show resolved Hide resolved

abukharin3 commented Sep 5, 2024

View reviewed changes

abukharin3 force-pushed the rejection_sampling_clean branch from 45411e0 to 3bc3bfd Compare September 10, 2024 17:20

terrykong requested changes Sep 16, 2024

View reviewed changes

abukharin3 force-pushed the rejection_sampling_clean branch from 6f7c825 to bae350e Compare September 16, 2024 13:34

gshennvm reviewed Sep 17, 2024

View reviewed changes

abukharin3 force-pushed the rejection_sampling_clean branch 2 times, most recently from ef0a70a to 04b421d Compare September 18, 2024 00:54

chrisalexiuk-nvidia and others added 18 commits September 17, 2024 20:55

Flattening URLs to adhere to best practices. (NVIDIA#204)

8335441

Signed-off-by: Chris Alexiuk <chris@alexiuk.ca> Co-authored-by: Chris Alexiuk <chris@alexiuk.ca> Signed-off-by: abukharin <abukharin@nvidia.com> Signed-off-by: Alexander Bukharin <abukharin@abukharin-mlt.client.nvidia.com>

add nemotron-4

50437c5

Signed-off-by: Gerald Shen <geshen@nvidia.com> Signed-off-by: abukharin <abukharin@nvidia.com> Signed-off-by: Alexander Bukharin <abukharin@abukharin-mlt.client.nvidia.com>

Add rejection sampling

b449002

Signed-off-by: abukharin <abukharin@nvidia.com> Signed-off-by: Alexander Bukharin <abukharin@abukharin-mlt.client.nvidia.com>

Update changelog

efdfa4a

Signed-off-by: abukharin <abukharin@nvidia.com> Signed-off-by: Alexander Bukharin <abukharin@abukharin-mlt.client.nvidia.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

c23a9de

for more information, see https://pre-commit.ci Signed-off-by: abukharin <abukharin@nvidia.com> Signed-off-by: Alexander Bukharin <abukharin@abukharin-mlt.client.nvidia.com>

convert_to_amp_o2_format

b4c8f60

Signed-off-by: abukharin <abukharin@nvidia.com> Signed-off-by: Alexander Bukharin <abukharin@abukharin-mlt.client.nvidia.com>

Remove model.module saving

e5b3208

Signed-off-by: abukharin <abukharin@nvidia.com> Signed-off-by: Alexander Bukharin <abukharin@abukharin-mlt.client.nvidia.com>

Remove model.module saving

c41eb14

Signed-off-by: abukharin <abukharin@nvidia.com> Signed-off-by: Alexander Bukharin <abukharin@abukharin-mlt.client.nvidia.com>

Correct metrics calculations

4f1f0ad

Signed-off-by: abukharin <abukharin@nvidia.com> Signed-off-by: Alexander Bukharin <abukharin@abukharin-mlt.client.nvidia.com>

Update readme

aa14ec5

Signed-off-by: abukharin <abukharin@nvidia.com> Signed-off-by: Alexander Bukharin <abukharin@abukharin-mlt.client.nvidia.com>

Add documentation and fix merging issues

7231bc3

Signed-off-by: abukharin <abukharin@nvidia.com> Signed-off-by: Alexander Bukharin <abukharin@abukharin-mlt.client.nvidia.com>

Add RS documentation

7ba731e

Signed-off-by: abukharin <abukharin@nvidia.com> Signed-off-by: Alexander Bukharin <abukharin@abukharin-mlt.client.nvidia.com>

fix util spacing

d26747e

Signed-off-by: Alexander Bukharin <abukharin@abukharin-mlt.client.nvidia.com>

fix util spacing

e98289b

Signed-off-by: Alexander Bukharin <abukharin@abukharin-mlt.client.nvidia.com>

fix util spacing

0973ad3

Signed-off-by: Alexander Bukharin <abukharin@abukharin-mlt.client.nvidia.com>

Add RS to docs readme

a737623

Signed-off-by: Alexander Bukharin <abukharin@abukharin-mlt.client.nvidia.com>

Fix RS descriptions

4b94e30

Signed-off-by: Alexander Bukharin <abukharin@abukharin-mlt.client.nvidia.com>

Update ppo utils file

ef3cd64

Signed-off-by: Alexander Bukharin <abukharin@abukharin-mlt.client.nvidia.com>

Alexander Bukharin added 3 commits September 17, 2024 20:55

Fixed vals in rs.rst

7b47984

Signed-off-by: Alexander Bukharin <abukharin@abukharin-mlt.client.nvidia.com>

Fix mbs logic

06aa3e4

Signed-off-by: Alexander Bukharin <abukharin@abukharin-mlt.client.nvidia.com>

Remove compute_mbs

ee9ba1b

Signed-off-by: Alexander Bukharin <abukharin@abukharin-mlt.client.nvidia.com>

abukharin3 force-pushed the rejection_sampling_clean branch from 04b421d to ee9ba1b Compare September 18, 2024 00:55

abukharin3 and others added 2 commits September 17, 2024 20:58

Merge branch 'main' into rejection_sampling_clean

355e1e9

[pre-commit.ci] auto fixes from pre-commit.com hooks

68bc080

for more information, see https://pre-commit.ci

odelalleau reviewed Sep 18, 2024

View reviewed changes

README.md Show resolved Hide resolved

abukharin3 and others added 6 commits September 19, 2024 15:36

Update docs/user-guide/rs.rst

e9de89a

Co-authored-by: Olivier Delalleau <507137+odelalleau@users.noreply.github.com>

Update docs/README.md

db2fd5e

Co-authored-by: Olivier Delalleau <507137+odelalleau@users.noreply.github.com>

Update examples/nlp/gpt/conf/gpt_rs_actor.yaml

9df428e

Co-authored-by: Olivier Delalleau <507137+odelalleau@users.noreply.github.com>

Update examples/nlp/gpt/conf/gpt_rs_actor.yaml

eca5a71

Co-authored-by: Olivier Delalleau <507137+odelalleau@users.noreply.github.com>

Update nemo_aligner/algorithms/rs.py

4e05348

Co-authored-by: Olivier Delalleau <507137+odelalleau@users.noreply.github.com>

Clean up index.rst and _run_inference

1a3736c

odelalleau reviewed Sep 20, 2024

View reviewed changes

docs/user-guide/rs.rst Outdated Show resolved Hide resolved

odelalleau reviewed Sep 20, 2024

View reviewed changes

Alexander Bukharin and others added 5 commits September 20, 2024 17:29

Start replacing top_n_rollouts

b77b5c8

Start replacing top_n_rollouts

7a5037b

Remove logprob computation functions

b29a806

Remove logprob computation functions

c9e0ada

[pre-commit.ci] auto fixes from pre-commit.com hooks

773f242

for more information, see https://pre-commit.ci

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rejection sampling clean #218

Rejection sampling clean #218

abukharin3 commented Jun 24, 2024

abukharin3 left a comment

terrykong Sep 16, 2024

abukharin3 Sep 16, 2024

odelalleau Sep 18, 2024

terrykong Sep 16, 2024

abukharin3 Sep 16, 2024

gshennvm Sep 17, 2024

abukharin3 Sep 17, 2024

abukharin3 Sep 17, 2024

gshennvm Sep 17, 2024

abukharin3 Sep 19, 2024

odelalleau left a comment

odelalleau Sep 18, 2024

odelalleau Sep 18, 2024

odelalleau Sep 18, 2024

odelalleau Sep 20, 2024

abukharin3 Sep 21, 2024 •

edited

Loading


		RESULTS_DIR="critic_results_dir"

		export PYTHONPATH="${NEMO_RLHF_DIR}:${PYTHONPATH}" \

	export PYTHONPATH="${NEMO_RLHF_DIR}:${PYTHONPATH}" \
	export PYTHONPATH="${GPFS}:${PYTHONPATH}" \

Rejection sampling clean #218

Are you sure you want to change the base?

Rejection sampling clean #218

Conversation

abukharin3 commented Jun 24, 2024

What does this PR do ?

Changelog

Usage

Before your PR is "Ready for review"

Checklist when contributing a new algorithm

Additional Information

abukharin3 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

odelalleau left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

abukharin3 Sep 21, 2024 • edited Loading

Choose a reason for hiding this comment

abukharin3 Sep 21, 2024 •

edited

Loading