Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrating custom metrics and inference parameters #82

Merged
merged 27 commits into from
Jan 26, 2024

Conversation

Kipok
Copy link
Contributor

@Kipok Kipok commented Jan 16, 2024

What does this PR do ?

Besides adding support for metrics/inference params, this PR has the following 2 small changes:

  1. Making scheduler optional in SFT
  2. Copy-pasting a fix for the scheduler's number of steps into the train_gpt_sft.py
    (let me know if you'd like me to split the PR into multiple independent pieces)

There are a few things that I didn't finish yet, but want to push this out in case there is additional refactoring that needs to be done based on the feedback. Here is what I'm going to do next

  1. Add one of the metrics from NeMo (probably exact string match) to give an example of usage
  2. Add tests
  3. Fix an issue with SFT config not being updated by the time model is created (there is a TODO in the code)
  4. Update the changelog / docs

Changelog

  • Please update the CHANGELOG.md under next version with high level changes in this PR.

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this 

Before your PR is "Ready for review"

Pre checks:

Checklist when contributing a new algorithm

  • Does the trainer resume and restore model state all states?
  • Does the trainer support all parallelism techniques(PP, TP, DP)?
  • Does the trainer support max_steps=-1 and validation?
  • Does the trainer only call APIs defined in alignable_interface.py?
  • Does the trainer have proper logging?

Additional Information

  • Related to # (issue)

@Kipok
Copy link
Contributor Author

Kipok commented Jan 18, 2024

The new "include_text" parameter will only work after NVIDIA/NeMo#8198 is merged on the nemo side (and included in the aligner Dockerfile)

Copy link
Collaborator

@odelalleau odelalleau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a first quick review -- I'm still missing the bigger picture and how everything fits together, but I'll come back to it once it's closer to completion.

examples/nlp/gpt/train_gpt_sft.py Outdated Show resolved Hide resolved
examples/nlp/gpt/train_gpt_sft.py Outdated Show resolved Hide resolved
nemo_aligner/data/nlp/builders.py Outdated Show resolved Hide resolved
nemo_aligner/data/nlp/builders.py Outdated Show resolved Hide resolved
nemo_aligner/metrics/common.py Outdated Show resolved Hide resolved
nemo_aligner/models/nlp/gpt/gpt_sft_model.py Outdated Show resolved Hide resolved
nemo_aligner/models/nlp/gpt/gpt_sft_model.py Outdated Show resolved Hide resolved
nemo_aligner/models/nlp/gpt/gpt_sft_model.py Outdated Show resolved Hide resolved
nemo_aligner/models/nlp/gpt/megatron_gpt_reward_model.py Outdated Show resolved Hide resolved
nemo_aligner/utils/train_utils.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@odelalleau odelalleau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a minor cleanup in latest commit

nemo_aligner/models/nlp/gpt/gpt_sft_model.py Outdated Show resolved Hide resolved
@Kipok Kipok force-pushed the igitman/metrics-integration branch from 2ad0c96 to e2bc490 Compare January 24, 2024 23:14
@github-actions github-actions bot added the documentation Improvements or additions to documentation label Jan 24, 2024
@Kipok
Copy link
Contributor Author

Kipok commented Jan 24, 2024

Added a new cherry-picked commit to Dockerfile for this PR which is required for metrics to work: NVIDIA/NeMo@7d3d9ac

@Kipok Kipok force-pushed the igitman/metrics-integration branch from d98591c to 77e457a Compare January 25, 2024 00:25
@github-actions github-actions bot removed the documentation Improvements or additions to documentation label Jan 25, 2024
Dockerfile Show resolved Hide resolved
@gshennvm
Copy link
Collaborator

your MR LGTM, I'm running the SFT nemo-ci to make sure this didn't break anything

nemo_aligner/metrics/__init__.py Outdated Show resolved Hide resolved
@Kipok Kipok force-pushed the igitman/metrics-integration branch from c70227a to 1f9107d Compare January 25, 2024 23:51
Copy link
Collaborator

@odelalleau odelalleau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, thanks for cleaning things up and adding this new feature.

I believe it's just missing an entry in CHANGELOG to mention this addition.

(side note, regarding the question "should Aligner hold metrics?" => I think so, yes, as long as we believe they may be useful to users in general -- fancy custom metrics specific to a single use case should stay in users' own codebases)

@Kipok Kipok force-pushed the igitman/metrics-integration branch from 398ad45 to f1aec3e Compare January 26, 2024 19:49
@Kipok
Copy link
Contributor Author

Kipok commented Jan 26, 2024

@odelalleau thanks for the reminder! I updated the changelog, but please let me know if any more details are needed there

gshennvm
gshennvm previously approved these changes Jan 26, 2024
@gshennvm
Copy link
Collaborator

CI passed on this branch, you can merge once @odelalleau resolves the conversation

Copy link
Collaborator

@odelalleau odelalleau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would just like to suggest a clarification to the changelog

CHANGELOG.md Outdated Show resolved Hide resolved
gshennvm and others added 26 commits January 26, 2024 13:43
Signed-off-by: Gerald Shen <geshen@nvidia.com>
Signed-off-by: Igor Gitman <igitman@nvidia.com>
Signed-off-by: Gerald Shen <geshen@nvidia.com>
Signed-off-by: Igor Gitman <igitman@nvidia.com>
Signed-off-by: Gerald Shen <geshen@nvidia.com>
Signed-off-by: Igor Gitman <igitman@nvidia.com>
Signed-off-by: Igor Gitman <igitman@nvidia.com>
Signed-off-by: Igor Gitman <igitman@nvidia.com>
Signed-off-by: Igor Gitman <igitman@nvidia.com>
Signed-off-by: Igor Gitman <igitman@nvidia.com>
Signed-off-by: Igor Gitman <igitman@nvidia.com>
Signed-off-by: Igor Gitman <igitman@nvidia.com>
Signed-off-by: Igor Gitman <igitman@nvidia.com>
Signed-off-by: Igor Gitman <igitman@nvidia.com>
Signed-off-by: Igor Gitman <igitman@nvidia.com>
Signed-off-by: Igor Gitman <igitman@nvidia.com>
Signed-off-by: Igor Gitman <igitman@nvidia.com>
Signed-off-by: Igor Gitman <igitman@nvidia.com>
for more information, see https://pre-commit.ci

Signed-off-by: Igor Gitman <igitman@nvidia.com>
Signed-off-by: Igor Gitman <igitman@nvidia.com>
Signed-off-by: Igor Gitman <igitman@nvidia.com>
Co-authored-by: Olivier Delalleau <507137+odelalleau@users.noreply.github.com>
Signed-off-by: Igor Gitman <igitman@nvidia.com>
Signed-off-by: Igor Gitman <igitman@nvidia.com>
Signed-off-by: Igor Gitman <igitman@nvidia.com>
Signed-off-by: Igor Gitman <igitman@nvidia.com>
Signed-off-by: Igor Gitman <igitman@nvidia.com>
Signed-off-by: Igor Gitman <igitman@nvidia.com>

Co-authored-by: Gerald Shen <119401249+gshennvm@users.noreply.github.com>
Signed-off-by: Igor Gitman <igitman@nvidia.com>
Signed-off-by: Igor Gitman <igitman@nvidia.com>
Signed-off-by: Igor Gitman <igitman@nvidia.com>
@Kipok Kipok force-pushed the igitman/metrics-integration branch from 37bbd18 to 27ca524 Compare January 26, 2024 21:44
@Kipok
Copy link
Contributor Author

Kipok commented Jan 26, 2024

I don't have write access, @odelalleau or @gshennvm can one of you please help to merge?

@odelalleau odelalleau merged commit d06b23f into NVIDIA:main Jan 26, 2024
3 checks passed
rohitrango pushed a commit to rohitrango/NeMo-Aligner that referenced this pull request Jun 25, 2024
Signed-off-by: Gerald Shen <geshen@nvidia.com>
Signed-off-by: Igor Gitman <igitman@nvidia.com>
Co-authored-by: Gerald Shen <geshen@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Olivier Delalleau <507137+odelalleau@users.noreply.github.com>
Co-authored-by: Gerald Shen <119401249+gshennvm@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants