-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
validation loops run the partial dataset with horovod #1684
Comments
@tgaddair pls ^^ |
I'll take a look. |
Hey @thnkim, can you provide a minimum reproducible example that demonstrates the behavior you're describing? I just ran quick test with an MNIST dataset. With 1 GPU, it ran 3750 training steps and 1875 validation steps per epoch. With 2 GPUs, it ran 1876 training steps and 938 validation steps per worker, which is consistent with the expected behavior. |
Hi @tgaddair! As you mentioned, with 2 GPUs and horovod, my 1901 validation samples are splitted to 951 for one GPU and 951 (not 950 here) for the other.
I have two questions:
Thank you! |
Hey @thnkim, to answer your questions:
|
Thank you, @tgaddair! And for DistributedSampler, yes it would not be problematic in my case. |
Hello,
It seems to be the same issue as #1161.
When I use horovod, validation_step and validation_epoch_end are called multiple times).
Thank you.
The text was updated successfully, but these errors were encountered: