Skip to content

Commit

Permalink
Shubhamagarwal92 master (Lightning-AI#1349)
Browse files Browse the repository at this point in the history
* SA: for Lightning-AI#958: set torch cuda device when finding root

* SA: for Lightning-AI#958: removing root gpu hack in trainer/evaluation_loop

* SA: setting torch cuda device

* comment line too long

* check if root gpu exists or available

* Incorporating suggestions on Lightning-AI#1094

* since root gpu returns none instead of -1 for cpu

* undo changes

* fixed dp memory thing

Co-authored-by: Shubham Agarwal <shubhamagarwal92@gmail.com>
  • Loading branch information
2 people authored and tullie committed May 6, 2020
1 parent d9f60b4 commit 6d7a1b4
Show file tree
Hide file tree
Showing 2 changed files with 4 additions and 0 deletions.
3 changes: 3 additions & 0 deletions pytorch_lightning/trainer/distrib_parts.py
Original file line number Diff line number Diff line change
Expand Up @@ -526,6 +526,9 @@ def dp_train(self, model):
if isinstance(device_ids, int):
device_ids = list(range(device_ids))

# set dp device
torch.cuda.set_device(self.root_gpu)

model = LightningDataParallel(model, device_ids=device_ids)

self.run_pretrain_routine(model)
Expand Down
1 change: 1 addition & 0 deletions pytorch_lightning/trainer/trainer.py
Original file line number Diff line number Diff line change
Expand Up @@ -389,6 +389,7 @@ def __init__(
self.gpus = gpus
self.data_parallel_device_ids = parse_gpu_ids(self.gpus)
self.root_gpu = determine_root_gpu_device(self.data_parallel_device_ids)
self.root_device = torch.device("cpu")

# tpu state flags
self.use_tpu = False
Expand Down

0 comments on commit 6d7a1b4

Please sign in to comment.