Skip to content

Commit

Permalink
Change Num Partitions size expansion fix (NVIDIA#4719)
Browse files Browse the repository at this point in the history
* add cloning

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* map to cpu

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>
Signed-off-by: Hainan Xu <hainanx@nvidia.com>
  • Loading branch information
aklife97 authored and Hainan Xu committed Nov 29, 2022
1 parent a8f7f67 commit 6519b02
Showing 1 changed file with 2 additions and 2 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -95,7 +95,7 @@ def split_partition(model, partitions, tp_size, write_path=None):

idx = 0
for name, param in model.named_parameters():
split_val = splits[idx][i]
split_val = splits[idx][i].clone()

if param.shape != split_val.shape:
logging.info(
Expand Down Expand Up @@ -178,7 +178,7 @@ def main():
merge_partition(model, partitions, args.target_file)
else:
app_state.model_parallel_size = 1
model = cls.restore_from(restore_path=args.model_file, trainer=trainer)
model = cls.restore_from(restore_path=args.model_file, trainer=trainer, map_location=torch.device("cpu"))

if tgt_tp_size > 1:
partitions = []
Expand Down

0 comments on commit 6519b02

Please sign in to comment.