RTC models do not compile for unknown future CUDA Architectures #844

ptheywood · 2022-05-03T13:59:20Z

RTC models are compiled for the device's compute capability, i.e when running on a consumer Ampere GPU, nvrtc is passed --gpu-architecture=compute_86.

However, if the version of NVRTC does not know about the GPU architecture this will fail to compile, and the user can do nothing about this (other than use a more recent NVRTC)

This means that RTC models will not run on newer GPUs (without using newer features), unlike non-RTC models which will (via PTX embedding / JITing).

To reproduce this, CUDA 11.0 knows SM_80 but not SM_86, so attempting to run a CUDA 11.0 RTC model on consume ampere will fail RTC compialtion, with an error during RTC compilation such as:

Compiler options: --gpu-architecture=compute_86 --generate-line-info -DNDEBUG --std=c++17 --define-macro=SEATBELTS=0 --pre-include=/usr/local/cuda-11.0/include//cuda.h

The text was updated successfully, but these errors were encountered:

ptheywood · 2022-05-06T13:53:31Z

The fix for this is to make use of nvrtcGetNumSupportedArchs and nvrtcGetSupportedArchs (docs) to find the arch's supported by the current nvrtc, and only pass the device's specific arch if it is in the list of supported arch's.

If it is not in the list, passing the latest arch that is supported should work (i.e. the last value returned by nvrtcGetSupportedArchs).

ptheywood · 2022-05-06T14:20:25Z

~~nvrtcGetNumSupportedArchs and nvrtcGetSupportedArchs were introduced in CUDA 11.0, so are not available in (the deprecated but not yet removed) CUDA 10.x.~~

~~In this case we have no idea about what CUDA arch's would work (other than the minimum configured at cmake time) so the only safe thing to do is remove setting the gencode if CUDA < 11.0.~~

~~As this is deprecated and can be removed at any time now, that's the easier option than worrying about a workaround.~~

Edit:
nvrtcGetNumSupportedArchs and nvrtcGetSupportedArchs were introduced in CUDA 11.2, so not available in CUDA 11.1 and older.

Support for these older CUDA versions could be:

Don't set the gencode if we can't query if it exists
Try an nvrtc compilation with the current gencode, if it fails, don't set a gencode
Hardcode the earliest and latest supported version based on the CUDA version macro(s) and / or the nvrtc version.

…the current nvrtc + device Closes #844 The maximum compute capability supported by the currently linked nvrt that is less than or equal to the device's architecture is used for RTC compilation. This fixes an issue where running an RTC model on consume ampere (SM_86) would fail on CUDA 11.0 and older, which are not aware of SM_86's existance. CUDA 11.2+ includes methods to query which architectures are supported by the dynamically linked NVRTC (which may add or remove architectures in new releases, and due to a stable ABI from 11.2 for all 11.x releases the linked version can be different than the version available at compile time). CUDA 11.1 and below (11.1, 11.0 and 10.x currently in our case) do not include these methods, and due to the absence of a stable nvrtc ABI for these versions the known values can be hardcoded at compile time (grim but simple). A method to select the most appropriate value form an ascending order vector has also been introduced, so this gencode functionality can be programatically tested without having to predict what values would be appropraite based on the current device and the cuda version used, which is a moving target.

…c & device Closes #844 The maximum compute capability supported by the currently linked NVRTC that is less than or equal to the device's architecture is used for RTC compilation. This fixes an issue where running an RTC model on consume ampere (SM_86) would fail on CUDA 11.0 and older, which are not aware of SM_86's existence. CUDA 11.2+ includes methods to query which architectures are supported by the dynamically linked NVRTC (which may add or remove architectures in new releases, and due to a stable ABI from 11.2 for all 11.x releases the linked version can be different than the version available at compile time). CUDA 11.1 and below (11.1, 11.0 and 10.x currently in our case) do not include these methods, and due to the absence of a stable nvrtc ABI for these versions the known values can be hardcoded at compile time (grim but simple). A method to select the most appropriate value form an ascending order vector has also been introduced, so this gencode functionality can be programmatically tested without having to predict what values would be appropriate based on the current device and the cuda version used, which is a moving target.

ptheywood added the bug label May 3, 2022

ptheywood mentioned this issue May 6, 2022

Set NVRTC gpu-architecture flag to maximum supported version #845

Merged

mondus closed this as completed in #845 May 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RTC models do not compile for unknown future CUDA Architectures #844

RTC models do not compile for unknown future CUDA Architectures #844

ptheywood commented May 3, 2022

ptheywood commented May 6, 2022

ptheywood commented May 6, 2022 •

edited

Loading

RTC models do not compile for unknown future CUDA Architectures #844

RTC models do not compile for unknown future CUDA Architectures #844

Comments

ptheywood commented May 3, 2022

ptheywood commented May 6, 2022

ptheywood commented May 6, 2022 • edited Loading

ptheywood commented May 6, 2022 •

edited

Loading