Model parallel #538

winglian · 2023-09-08T01:20:22Z

supports falcon-180b now! using naive model parallel. correctly sets the device when launching on multi-gpu without accelerate

winglian · 2023-09-08T03:41:18Z

src/axolotl/utils/bench.py

@@ -28,7 +28,7 @@ def gpu_memory_usage_smi(device=0):


 def log_gpu_memory_usage(log, msg, device):
-    if not torch.cuda.is_available():
+    if not torch.cuda.is_available() or device == "auto":


not able to determine the device to log gpu stats for

* model-parallel for single process * fix device/device_map * fix handling for device

winglian requested a review from tmm1 September 8, 2023 01:20

winglian commented Sep 8, 2023

View reviewed changes

winglian added 3 commits September 8, 2023 16:11

model-parallel for single process

8df288d

fix device/device_map

04625e0

fix handling for device

8ff0109

winglian force-pushed the model-parallel branch from b67fe7c to 8ff0109 Compare September 8, 2023 20:11

winglian merged commit f6060a6 into main Sep 13, 2023
6 checks passed

winglian deleted the model-parallel branch September 13, 2023 15:45

brthor mentioned this pull request Nov 22, 2023

[Feature Request] Multi-Node Model Parallel #887

Open

5 tasks

mkeoliya pushed a commit to mkeoliya/axolotl that referenced this pull request Dec 15, 2023

Model parallel (axolotl-ai-cloud#538)

5f8d74f

* model-parallel for single process * fix device/device_map * fix handling for device

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model parallel #538

Model parallel #538

winglian commented Sep 8, 2023

winglian Sep 8, 2023

Model parallel #538

Model parallel #538

Conversation

winglian commented Sep 8, 2023

winglian Sep 8, 2023

Choose a reason for hiding this comment