slow new epoch start with setting ddp, num_workers, gpus #1884
-
❓ Questions and HelpBefore asking:
What is your question?I am training MNIST with below code. 1 GPU training is ok. Codeimport torch class LightningMNISTClassifier(pl.LightningModule):
if name == 'main':
What have you tried?Horovod backend does not show slow start of new epoch. What's your environment?
|
Beta Was this translation helpful? Give feedback.
Replies: 4 comments 1 reply
-
I experience similar things - when running with ddp, then it seems the higher num_workers the longer it takes before getting data to the GPUs. |
Beta Was this translation helpful? Give feedback.
-
@mpaepper Your comment may be right. Epoch 2: 0%| | 0/470 [00:00<?, ?it/s, loss=0.180, v_num=113]/opt/conda/lib/python3.7/site-packages/torchvision/io/video.py:2: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses |
Beta Was this translation helpful? Give feedback.
-
I found that the slow deprecationwarnings shown above are due to the torchvision library. I changed to a simple dataset and the slow start disappeared until now. |
Beta Was this translation helpful? Give feedback.
-
The isssue is not due to pl. |
Beta Was this translation helpful? Give feedback.
I found that the slow deprecationwarnings shown above are due to the torchvision library. I changed to a simple dataset and the slow start disappeared until now.