You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Traceback (most recent call last):
File "/home/jmorton/miniconda3/envs/alignment/bin/deepblast-train", line 7, in <module>
exec(compile(f.read(), __file__, 'exec'))
File "/home/jmorton/research/gert/deepblast/scripts/deepblast-train", line 67, in <module>
main(hparams)
File "/home/jmorton/research/gert/deepblast/scripts/deepblast-train", line 47, in main
trainer.fit(model)
File "/home/jmorton/miniconda3/envs/alignment/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 964, in fit
self.set_random_port()
File "/home/jmorton/miniconda3/envs/alignment/lib/python3.8/site-packages/pytorch_lightning/trainer/distrib_data_parallel.py", line 392, in set_random_port
assert self.num_nodes == 1, 'random port can only be called from single node training'
AssertionError: random port can only be called from single node training
It looks like multi-node GPU support is still an outstanding task - if I execute the following script to run on 4 nodes (16 gpus)
I get the following error
Its likely because this line of code just originated from a merge yesterday here: Lightning-AI/pytorch-lightning#2512 (comment)
The text was updated successfully, but these errors were encountered: