-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can't convert trained AMP model to full precision #349
Comments
With
All of the leaves should be float. However, onnx is also recording ops and temporaries internal to the graph. These will execute in a mixture of float and half, regardless of what type the leaves are, because with
Unrelated: I suspect that your training time went up with mixed precision because your model is fairly small, and therefore not fully utilizing the device, such that the overhead of mixed precision casts becomes significant relative to the actual model ops. Also, your final linear layer's output size is 10, which is not a multiple of 8, and therefore won't be able to use Tensor Cores (#221 (comment)). You could probably pad the output size to 16 (6 unused/dummy classes) and see better performance. |
That line
Without those mentions of f16 in there, it runs in other ONNX environments just fine. These little numbers are so weird looking, it's no wonder they cause so much trouble. I'm glad you mentioned that multiple of 8 thing- it's easy to miss crap like that because stuff just works anyway. (Is there a way to figure out whether tensor cores are actually getting used? Should the batch size be a multiple of 8 also?) Kicking the number of outputs up to 16 seems to have no performance impact with this little model, but the model I'm actually using is about 3000 times larger with several huge fully connected layers and 343 output classes. It barely fits on the card but I'll try it with 344. |
Yes, the batch size should also be a multiple of 8. I pinned the issue I sent earlier (#221 (comment)) but it's still easy to miss. I'm planning to augment the Amp patching so that it will check tensor sizes entering FP16 linear layers, and warn once if sizes are not a multiple of 8. |
I have a basic benchmark test where I train a CNN on MNIST data with and without AMP. The problem is that I can't get the f16 types out of the model or export it to a CPU. Calling
float()
on the model doesn't seem to do anything.This code prints the following:
Using mixed precision, the training time per batch went up 43%, but I can still increase the batch size now so that's OK. (This is on an RTX-2060 with 6GB.) What's more concerning is that I seem to be stuck in f16-land.
The expected behavior for
model.float()
is to convert all parameters and buffers to Float, but it's still riddled with these Half types making it useless in any environment with no f16 support. How do I get them out of there iffloat()
doesn't do anything?The text was updated successfully, but these errors were encountered: