-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How many and which GPUs are needed for training #1
Comments
Thanks for your recognition. We use 8 V100 GPUs for training ResNet50 and ResNet18. Actually, 4 GTX1080 GPUs are enough for ResNet18. |
I try to use 2 RTX3090 GPUs to train ResNet18. It seems that it needs nearly 1.5 hour for one epoch, which is over 7 days for 120 epochs. It is such a long time that our group cannot afford this. I also want to know that how long does it cost when you train on both ResNet18 and ResNet50? Thank you! |
On 8 V100 GPUs, ResNet50 needs 4 days, and ResNet18 only needs 1 day. On 4 GTX1080 GPUs, ResNet18 needs about 2 days. Note that ImageNet data should be stored in the Solid State Disk (SSD), which largely speeds up the training (about twice). |
Thank you very much! |
I still have some problems with dataloader in code. Why don't you apply And could you please release the code of Tested at New Resolutions in your ablation study Thank you very much! |
Hi, thanks for your interest. Since we find the performance without normalization already achieves SOTA, we do not apply the normalization. The provided code won't obtain the nan cross-entropy loss. If you modify the code and meet the nan loss in an epoch during training, you could probably reduce the initial learning rate. The code for testing at new resolutions is provided as below, which basically calibrates BNs according to the new reslution:
|
Thank you very much! |
Thanks for your great work! But when I want to reproduce your code, I meet some troubles on CUDA out of memory. I'm very curious about what GPU and how many GPUs your experiment was implemented on when traning on ResNet50 and ResNet18
The text was updated successfully, but these errors were encountered: