Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Deepspeed loading #950

Merged
merged 5 commits into from
Dec 13, 2023
Merged

Fix Deepspeed loading #950

merged 5 commits into from
Dec 13, 2023

Conversation

winglian
Copy link
Collaborator

deepspeed wasn't loading properly to shard the model weights when loading them.

This PR fixes

  • printing ascii art after loading the configuration and setting env vars so deepspeed gets loaded properly by accelerate
  • adds an option to freeze/unfreeze certain parameters. This can allow for lower memory requirements when fine-tuning mixtral

example that freezes everything except layers 20-31

unfrozen_parameters:
  - lm_head.*
  - model.embed_tokens.*
  - model.layers.2[0-9].*
  - model.layers.29.*
  - model.layers.30.*
  - model.layers.31.*

@@ -285,6 +286,9 @@ def load_model(
model_kwargs["max_memory"] = cfg.max_memory
model_kwargs["torch_dtype"] = cfg.torch_dtype

if is_deepspeed_zero3_enabled():
del model_kwargs["device_map"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you have to delete this key?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deepspeed maps the weights on its own and doesn't want device_map set

https://github.com/huggingface/transformers/blob/main/src/transformers/modeling_utils.py#L2858-L2861

Comment on lines +18 to +24
unfrozen_parameters:
# - lm_head.*
# - model.embed_tokens.*
# - model.layers.2[0-9]+.block_sparse_moe.gate.*
# - model.layers.2[0-9]+.block_sparse_moe.experts.*
# - model.layers.3[0-9]+.block_sparse_moe.gate.*
# - model.layers.3[0-9]+.block_sparse_moe.experts.*
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@winglian this example is good but perhaps can you add this new unfrozen_parameters to the top level README ?

Copy link
Collaborator

@hamelsmu hamelsmu Dec 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also add a comment about how the regex works in the docs so people know?

@hamelsmu
Copy link
Collaborator

hamelsmu commented Dec 13, 2023

@winglian Really excited that you got this in so fast! Just have a suggestion about docs. I'm happy to add it too for you in a follow on PR

@winglian
Copy link
Collaborator Author

@winglian Really excited that you got this in so fast! Just have a suggestion about docs. I'm happy to add it too for you in a follow on PR

thanks for the review. I'll get this added to the docs later tonight or tomorrow.

@winglian winglian merged commit 5ea3aa3 into main Dec 13, 2023
4 checks passed
@winglian winglian deleted the correct-zero3-impl branch December 13, 2023 21:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants