Default to `allow_tf32=True` for GPU Devices #1275

ravi-mosaicml · 2022-07-12T15:14:14Z

PyTorch 1.12 disabled TF32 format by default for matmuls. This could lead to a significant performance regression.

To fix, adding a flag in the DeviceGPU class to control whether to set PyTorch's allow_tf32 flag. By default, Composer will set this to True, for consistent behavior across pytorch versions.

Closes https://mosaicml.atlassian.net/browse/CO-328

PyTorch 1.12 disabled TF32 format by default for matmuls. This could lead to a significant performance regression. To fix, adding a flag in the `DeviceGPU` class to control whether to set PyTorch's `allow_tf32` flag. By default, Composer will set this to True, for consistent behavior across pytorch versions. Closes https://mosaicml.atlassian.net/browse/CO-328

composer/trainer/devices/device_gpu.py

abhi-mosaic

Looks good to me! For posterity, I want to note that we put this flag in the DeviceGPU rather than as a Trainer precision enum, because it reflects how certain devices (NVIDIA Ampere+ GPUs ) perform "single precision" GEMM ops, rather than an overall training loop strategy for training with a particular precision.

At no point will any tensor or checkpoint or activation ever be stored in TF32, the format only exists inside NVIDIA tensor cores right before the inner products of a GEMM are computed, and then the accumulation and output tensor are true FP32. See this article

An analogy would be if we implemented DeviceTPU, we might want an allow_bf16=True flag which is forced to True, which is not the same as BF16 mixed precision training. In their default behavior, TPUs cast FP32 tensors to BF16 right before the GEMM inner products, and output FP32. The overall training strategy of BF16 mixed precision (with BF16 activations and gradients) is a separate step. See this article

linden-li

LGTM!

ravi-mosaicml requested a review from abhi-mosaic July 12, 2022 15:14

ravi-mosaicml mentioned this pull request Jul 12, 2022

Add pytorch 1.12.0 docker image #1247

Merged

1 task

Force Jenkins

50c3644

ravi-mosaicml requested review from bandish-shah, linden-li, eracah, ishanashastri, dskhudia and milocress July 12, 2022 16:18

dskhudia reviewed Jul 12, 2022

View reviewed changes

composer/trainer/devices/device_gpu.py Show resolved Hide resolved

dskhudia approved these changes Jul 12, 2022

View reviewed changes

abhi-mosaic approved these changes Jul 12, 2022

View reviewed changes

linden-li approved these changes Jul 12, 2022

View reviewed changes

abhi-mosaic merged commit 924cf27 into mosaicml:dev Jul 12, 2022

ravi-mosaicml deleted the CO-328 branch July 13, 2022 01:33

ravi-mosaicml added a commit that referenced this pull request Jul 16, 2022

Default to allow_tf32=True for GPU Devices (#1275)

a8321fe

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Default to `allow_tf32=True` for GPU Devices #1275

Default to `allow_tf32=True` for GPU Devices #1275

ravi-mosaicml commented Jul 12, 2022

abhi-mosaic left a comment •

edited

Loading

linden-li left a comment

Default to allow_tf32=True for GPU Devices #1275

Default to allow_tf32=True for GPU Devices #1275

Conversation

ravi-mosaicml commented Jul 12, 2022

abhi-mosaic left a comment • edited Loading

Choose a reason for hiding this comment

linden-li left a comment

Choose a reason for hiding this comment

Default to `allow_tf32=True` for GPU Devices #1275

Default to `allow_tf32=True` for GPU Devices #1275

abhi-mosaic left a comment •

edited

Loading