Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FP16 matmuls with apex #562

Closed
KeAWang opened this issue Oct 22, 2019 · 1 comment
Closed

FP16 matmuls with apex #562

KeAWang opened this issue Oct 22, 2019 · 1 comment

Comments

@KeAWang
Copy link

KeAWang commented Oct 22, 2019

Hi,

Are there plans to support matmuls in FP16 with APEX? It seems that you can make low level CUDA calls that do FP16 GEMMs and accumulate in FP32. However, this feature is not exposed in PyTorch. It would be great if we could have mixed precision matmuls through apex!

@mcarilli
Copy link
Contributor

mcarilli commented Oct 22, 2019

The "FP16 input, internal FP32 accumulate, FP16 output" behavior is a hardware feature of Tensor Cores, far below anything that Apex controls. It is already "exposed" in Pytorch in the sense that any matrix multiplication you call with torch.cuda.HalfTensors as inputs will use Tensor Cores if your GPU has them (ie, if it's Volta or Turing).

Apex is not doing anything weird to ensure that Tensor Cores are invoked. All it does is cast the inputs of torch.mms, etc, to half on the Python side to ensure that the Tensor Core hardware path is eventually taken. You can call .half() on inputs to torch.mm (and other GEMM calls) manually (if you want) and achieve the same result.

In addition to inputs being FP16, there are some other (fairly easy to satisfy) constraints on tensor dimensions that must be satisfied to enable Tensor Core use: #221 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants