Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

0 training loss #83

Closed
santosh-b opened this issue Nov 5, 2020 · 6 comments
Closed

0 training loss #83

santosh-b opened this issue Nov 5, 2020 · 6 comments

Comments

@santosh-b
Copy link

santosh-b commented Nov 5, 2020

For Pytorch 1.7 only:

Hi. I'm getting a UserWarning: Failed to calculate the accuracy. which results in 0 loss for training (so- presumably, an error being thrown during gradient update or something). I'm running the following command (from the examples) on Colab:

python -m robustness.main --dataset cifar --data /path/to/cifar \
   --adv-train 0 --arch resnet18 --out-dir /logs/checkpoints/dir/

Any help? Thanks

@Icxa
Copy link

Icxa commented Nov 5, 2020

I am encountering the same warning since a couple of days. A script that successfully calculated the accuracy few weeks back, is now giving this UserWarning.
P.S: I had re-installed the package robustness few days ago.

@andrewilyas
Copy link
Collaborator

andrewilyas commented Nov 5, 2020

Thanks for bringing this up, looking into it right now. The good thing is that this is an error thrown during calculating the accuracy from the predictions, not from the actual gradient update, so the network should indeed be training. I am looking into the error now.

Can you tell me what version of PyTorch and robustness you are running?

Edit: I just ran the command above using the current master branch version of robustness and PyTorch 1.6 and did not encounter any errors, so it may be a versioning problem.

@santosh-b
Copy link
Author

santosh-b commented Nov 5, 2020

Thanks for the quick update. You're absolutely correct, it's a version issue. Pytorch 1.7 causes this to not report accuracy, likely due to some compatibility error within the metrics reporting (which is try/catch blocked). I'll update the original issue accordingly

@Icxa
Copy link

Icxa commented Nov 6, 2020

Yes I confirm the same that it works completely fine with PyTorch 1.6.0. Thank you very much for your response @andrewilyas :)

@andrewilyas
Copy link
Collaborator

Just to provide an update on this: I've figured out what's changed between 1.6 and 1.7 that causes this error, and will push a hotfix in the next few days. In the meantime, if you want to change your own local copy, it just requires changing view to reshape on this line:

correct_k = correct[:k].view(-1).float()

That should suffice to fix the issue.

@andrewilyas
Copy link
Collaborator

This has now been fixed and pushed to PyPI, closing now! Feel free to open a new issue if anything else arises.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants