Fix Deepspeed loading #950

winglian · 2023-12-13T16:47:45Z

deepspeed wasn't loading properly to shard the model weights when loading them.

This PR fixes

printing ascii art after loading the configuration and setting env vars so deepspeed gets loaded properly by accelerate
adds an option to freeze/unfreeze certain parameters. This can allow for lower memory requirements when fine-tuning mixtral

example that freezes everything except layers 20-31

unfrozen_parameters:
  - lm_head.*
  - model.embed_tokens.*
  - model.layers.2[0-9].*
  - model.layers.29.*
  - model.layers.30.*
  - model.layers.31.*

deepspeed/zero3_bf16.json

hamelsmu · 2023-12-13T18:57:10Z

src/axolotl/utils/models.py

@@ -285,6 +286,9 @@ def load_model(
    model_kwargs["max_memory"] = cfg.max_memory
    model_kwargs["torch_dtype"] = cfg.torch_dtype

+    if is_deepspeed_zero3_enabled():
+        del model_kwargs["device_map"]


Why do you have to delete this key?

deepspeed maps the weights on its own and doesn't want device_map set

https://github.com/huggingface/transformers/blob/main/src/transformers/modeling_utils.py#L2858-L2861

hamelsmu · 2023-12-13T19:00:15Z

examples/mistral/mixtral.yml

+unfrozen_parameters:
+#  - lm_head.*
+#  - model.embed_tokens.*
+#  - model.layers.2[0-9]+.block_sparse_moe.gate.*
+#  - model.layers.2[0-9]+.block_sparse_moe.experts.*
+#  - model.layers.3[0-9]+.block_sparse_moe.gate.*
+#  - model.layers.3[0-9]+.block_sparse_moe.experts.*


@winglian this example is good but perhaps can you add this new unfrozen_parameters to the top level README ?

I would also add a comment about how the regex works in the docs so people know?

hamelsmu · 2023-12-13T19:02:04Z

@winglian Really excited that you got this in so fast! Just have a suggestion about docs. I'm happy to add it too for you in a follow on PR

winglian · 2023-12-13T21:03:21Z

@winglian Really excited that you got this in so fast! Just have a suggestion about docs. I'm happy to add it too for you in a follow on PR

thanks for the review. I'll get this added to the docs later tonight or tomorrow.

winglian added 3 commits December 13, 2023 09:04

add check for zero3

d414751

freeze parameters

7d2ec9b

fixes for deepspeed loading

ad9538a

winglian requested review from NanoCode012, hamelsmu and casper-hansen December 13, 2023 16:48

hamelsmu reviewed Dec 13, 2023

View reviewed changes

deepspeed/zero3_bf16.json Show resolved Hide resolved

winglian added 2 commits December 13, 2023 13:23

fix model parameter check

355c1e3

unfrozen parameters in example mixtral and logging when unfreezing

ec12e91

hamelsmu reviewed Dec 13, 2023

View reviewed changes

winglian merged commit 5ea3aa3 into main Dec 13, 2023
4 checks passed

winglian deleted the correct-zero3-impl branch December 13, 2023 21:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Deepspeed loading #950

Fix Deepspeed loading #950

winglian commented Dec 13, 2023

hamelsmu Dec 13, 2023

winglian Dec 13, 2023

hamelsmu Dec 13, 2023

hamelsmu Dec 13, 2023 •

edited

Loading

hamelsmu commented Dec 13, 2023 •

edited

Loading

winglian commented Dec 13, 2023

Fix Deepspeed loading #950

Fix Deepspeed loading #950

Conversation

winglian commented Dec 13, 2023

hamelsmu Dec 13, 2023

Choose a reason for hiding this comment

winglian Dec 13, 2023

Choose a reason for hiding this comment

hamelsmu Dec 13, 2023

Choose a reason for hiding this comment

hamelsmu Dec 13, 2023 • edited Loading

Choose a reason for hiding this comment

hamelsmu commented Dec 13, 2023 • edited Loading

winglian commented Dec 13, 2023

hamelsmu Dec 13, 2023 •

edited

Loading

hamelsmu commented Dec 13, 2023 •

edited

Loading