0 training loss #83

santosh-b · 2020-11-05T03:15:28Z

For Pytorch 1.7 only:

Hi. I'm getting a UserWarning: Failed to calculate the accuracy. which results in 0 loss for training (so- presumably, an error being thrown during gradient update or something). I'm running the following command (from the examples) on Colab:

python -m robustness.main --dataset cifar --data /path/to/cifar \
   --adv-train 0 --arch resnet18 --out-dir /logs/checkpoints/dir/

Any help? Thanks

The text was updated successfully, but these errors were encountered:

Icxa · 2020-11-05T10:25:31Z

I am encountering the same warning since a couple of days. A script that successfully calculated the accuracy few weeks back, is now giving this UserWarning.
P.S: I had re-installed the package robustness few days ago.

andrewilyas · 2020-11-05T13:45:21Z

Thanks for bringing this up, looking into it right now. The good thing is that this is an error thrown during calculating the accuracy from the predictions, not from the actual gradient update, so the network should indeed be training. I am looking into the error now.

Can you tell me what version of PyTorch and robustness you are running?

Edit: I just ran the command above using the current master branch version of robustness and PyTorch 1.6 and did not encounter any errors, so it may be a versioning problem.

santosh-b · 2020-11-05T23:20:29Z

Thanks for the quick update. You're absolutely correct, it's a version issue. Pytorch 1.7 causes this to not report accuracy, likely due to some compatibility error within the metrics reporting (which is try/catch blocked). I'll update the original issue accordingly

Icxa · 2020-11-06T12:14:42Z

Yes I confirm the same that it works completely fine with PyTorch 1.6.0. Thank you very much for your response @andrewilyas :)

andrewilyas · 2020-11-08T04:35:48Z

Just to provide an update on this: I've figured out what's changed between 1.6 and 1.7 that causes this error, and will push a hotfix in the next few days. In the meantime, if you want to change your own local copy, it just requires changing view to reshape on this line:

robustness/robustness/tools/helpers.py

Line 75 in 2dabf3b

correct_k = correct[:k].view(-1).float()

That should suffice to fix the issue.

andrewilyas · 2020-12-01T06:12:51Z

This has now been fixed and pushed to PyPI, closing now! Feel free to open a new issue if anything else arises.

andrewilyas closed this as completed Dec 1, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

0 training loss #83

0 training loss #83

santosh-b commented Nov 5, 2020 •

edited

Loading

Icxa commented Nov 5, 2020

andrewilyas commented Nov 5, 2020 •

edited

Loading

santosh-b commented Nov 5, 2020 •

edited

Loading

Icxa commented Nov 6, 2020

andrewilyas commented Nov 8, 2020

andrewilyas commented Dec 1, 2020

0 training loss #83

0 training loss #83

Comments

santosh-b commented Nov 5, 2020 • edited Loading

For Pytorch 1.7 only:

Icxa commented Nov 5, 2020

andrewilyas commented Nov 5, 2020 • edited Loading

santosh-b commented Nov 5, 2020 • edited Loading

Icxa commented Nov 6, 2020

andrewilyas commented Nov 8, 2020

andrewilyas commented Dec 1, 2020

santosh-b commented Nov 5, 2020 •

edited

Loading

andrewilyas commented Nov 5, 2020 •

edited

Loading

santosh-b commented Nov 5, 2020 •

edited

Loading