Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix overwriting FP8 act ckpt flag in the train script #1107

Merged
merged 2 commits into from
Apr 10, 2024

Conversation

cli99
Copy link
Contributor

@cli99 cli99 commented Apr 10, 2024

The current activation checkpointing with FP8 TE requires using the non-reentrant implementation from TE. The fsdp_config for activation checkpointing with TE FP8 is

  fsdp_config:
    activation_checkpointing: true
    activation_checkpointing_reentrant: false
    activation_cpu_offload: false
    te_checkpoint_wrapper: true

The activation_checkpointing_reentrant param in yaml get overwritten in the train.py script due to legacy reasons. Need to rewrite the check.

@cli99 cli99 requested a review from mvpatel2000 April 10, 2024 16:02
@cli99 cli99 enabled auto-merge (squash) April 10, 2024 16:15
Copy link
Collaborator

@mvpatel2000 mvpatel2000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Sorry for the bad merge

@cli99 cli99 merged commit b5fc0fa into main Apr 10, 2024
9 checks passed
KuuCi pushed a commit that referenced this pull request Apr 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants