Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged PR #925 breaks CPU-Only installs #931

Closed
aatifjiwani opened this issue Aug 4, 2020 · 8 comments · Fixed by #937
Closed

Merged PR #925 breaks CPU-Only installs #931

aatifjiwani opened this issue Aug 4, 2020 · 8 comments · Fixed by #937

Comments

@aatifjiwani
Copy link

While running a Github CD job, it attempted to install Apex, but failed with the following traceback:

Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-req-build-f1vgc77y/setup.py", line 35, in <module>
        _, bare_metal_major, _ = get_cuda_bare_metal_version(cpp_extension.CUDA_HOME)
      File "/tmp/pip-req-build-f1vgc77y/setup.py", line 14, in get_cuda_bare_metal_version
        raw_output = subprocess.check_output([cuda_dir + "/bin/nvcc", "-V"], universal_newlines=True)
    TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'

It seems that cuda_dir is None, which is intuitively expected when installing Apex on a CPU only machine. But even when installing on a CPU only machine, the setup is still looking for the Nvidia Cross-Compilers.

The most recent commit as of 8/4/2020 (hash: 5b53121...) adds this line into setup, which breaks the install.

The corresponding line within the current release of Apex:

apex/setup.py

Line 14 in 5b53121

raw_output = subprocess.check_output([cuda_dir + "/bin/nvcc", "-V"], universal_newlines=True)

@matthaeusheer
Copy link

I can confirm this issue.

@sugeeth14
Copy link

sugeeth14 commented Aug 5, 2020

I am facing the same issue as well but in my case I am facing the issue when trying to install apex in a GPU machine. Was anyone able to fix it ?

@zifuwanggg
Copy link

same here

@aatifjiwani
Copy link
Author

I am facing the same issue as well but in my case I am facing the issue when trying to install apex in a GPU machine. Was anyone able to fix it ?

@raghava14 The fix that we were able to make was install Apex using the commit before the one that broke the installation. Obviously this is not good in the long run but its a fine temporary fix

@ptrblck
Copy link
Contributor

ptrblck commented Aug 7, 2020

Thanks for reporting this issue. The get_cuda_bare_metal_version has to be moved inside the MHA code.
I'll provide a fix tomorrow for it.

@mcarilli
Copy link
Contributor

Please do not consider the fix an endorsement of Apex as a future-proof source of mixed precision. torch.cuda.amp is the truth. It's much easier to use. The modularity and flexibility makes it almost fun.

@vince62s
Copy link

Even if torch.cuda.amp must be the ground truth :) if we need fusedadam, Apex is still required, and can confirm #937 does not fix the issue. Needs to be reopen.

@monk1337
Copy link

monk1337 commented Nov 12, 2020

Run this command export TORCH_CUDA_ARCH_LIST="compute capability" before installing apex from source code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants