Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rebuild for CUDA 12 #4

Conversation

regro-cf-autotick-bot
Copy link
Contributor

This PR has been triggered in an effort to update cuda120.

Notes and instructions for merging this PR:

  1. Please merge the PR only after the tests have passed.
  2. Feel free to push to the bot's branch to update this PR if needed.

Please note that if you close this PR we presume that the feedstock has been rebuilt, so if you are going to perform the rebuild yourself don't close this PR until the your rebuild has been merged.

If this PR was opened in error or needs to be updated please add the bot-rerun label to this PR. The bot will close this PR and schedule another one. If you do not have permissions to add this label, you can use the phrase @conda-forge-admin, please rerun bot in a PR comment to have the conda-forge-admin add it for you.

This PR was created by the regro-cf-autotick-bot. The regro-cf-autotick-bot is a service to automatically track the dependency graph, migrate packages, and propose package version updates for conda-forge. Feel free to drop us a line if there are any issues! This PR was generated by https://github.com/regro/cf-scripts/actions/runs/5137011166, please use this URL for debugging.

@conda-forge-webservices
Copy link
Contributor

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe) and found it was in an excellent condition.

@jakirkham
Copy link
Member

@conda-forge-admin, please re-render

@carterbox
Copy link
Member

Need to add a conditional for cuda 12 so that arch 35 is dropped (and 90 is added?).

@jakirkham
Copy link
Member

jakirkham commented May 31, 2023

Here's the current CI error:

nvcc fatal   : Unsupported gpu architecture 'compute_35'

This indicates that this line likely needs changes for CUDA 12

export CUDA_ARCH_LIST="sm_35,sm_50,sm_60,sm_70,sm_80"
export CUDAARCHS="35-virtual;50-virtual;60-virtual;70-virtual;80-virtual"

Could do something like this to pass relevant GPU architectures based on CUDA version used

According to this doc sm_50 is the oldest supported and the newest is sm_90a

Edit: Seems we posted the same thing. Sorry didn't see your comment until posting. In any event maybe the added detail is helpful 🙂

@carterbox
Copy link
Member

Could do something like this to pass relevant GPU architectures based on CUDA version used

That's a good hint! I'm not very proficient in bash, so I didn't know you could do matching with wildcards like that!

I use Wikipedia to look up and make choices about which archs to build for because they have very good tables.

recipe/build.sh Outdated Show resolved Hide resolved
@carterbox
Copy link
Member

For CUDA 12 the following errors from CMake:

CMake Error at CMakeLists.txt:593 (target_link_libraries):
  Target "magma" links to:

    CUDA::cublas

  but the target was not found.  Possible reasons include:

    * There is a typo in the target name.
    * A find_package call is missing for an IMPORTED target.
    * An ALIAS target is missing.

CMake Error at CMakeLists.txt:687 (target_link_libraries):
  Target "magma_sparse" links to:

    CUDA::cublas

  but the target was not found.  Possible reasons include:

    * There is a typo in the target name.
    * A find_package call is missing for an IMPORTED target.
    * An ALIAS target is missing.

I don't think cublas would have been removed from the cudatoolkit, but maybe the name changed or these targets are missing from exported the CMake files?

@carterbox
Copy link
Member

@jakirkham, is the transition to CUDA 12 also the transition to the new CUDA packages? i.e. I now need to depend on the libcublas-dev package instead of cudatoolkit if I want to link to libcublas?

@jakirkham
Copy link
Member

jakirkham commented May 31, 2023

Yeah exactly

Think we need something like this. Though probably can be simplified significantly

Would start with just libcublas-dev (in host). Maybe like...

    - libcublas-dev  # [cuda_compiler_version == "12.0"]

@carterbox
Copy link
Member

CUDA 12 library headers are moved, and thus no longer found by the compiler. Wait on this PR until we figure out whether the solution is elsewhere or requires modifying this recipe.

@jakirkham
Copy link
Member

Yeah we add these headers to host compiler's include path (nvcc already knows where to look)

Do we have a sense of where libmagma is looking for these?

@carterbox
Copy link
Member

It looks like y'all are adding these include paths to the CXXFLAGS, CUDA_CFLAGS, etc environment variables. Thus, they will not make it to the compiler if the downstream package modifies incorrectly or does not use these environment variables.

This is subtlety that should be noted.

@carterbox
Copy link
Member

@jakirkham According to the logs, a prefix with CUDA headers is being added, but it's incomplete.

--     INCLUDE_DIRECTORIES:   -I$BUILD_PREFIX/targets/x86_64-linux/include -I$SRC_DIR/build/include -I$SRC_DIR/include -I$SRC_DIR/control -I$SRC_DIR/magmablas -I$SRC_DIR/sparse/include -I$SRC_DIR/sparse/control -I$SRC_DIR/testing 

The problem is that libcublas-dev package belongs in the host requirements which is installed to $PREFIX, but the nvcc package belongs in build requirements which is installed to $BUILD_PREFIX, so the compiler package doesn't add the directories for libcublas to the CUDA_CFLAGS.

https://dev.azure.com/conda-forge/feedstock-builds/_build/results?buildId=715409&view=logs&j=4f922444-fdfe-5dcf-b824-02f86439ef14&t=b2a8456a-fb11-5506-ca32-5ccd32538dc0&l=2398

@carterbox
Copy link
Member

But both $PREFIX and $BUILD_PREFIX are added in the activation script you linked...

@carterbox
Copy link
Member

This package is sorting compiler options into different categories: CXXFLAGS, CFLAGS, NVCCFLAGS, and INCLUDE_DIRECTORIES. I just need to insure that each gets the appropriate flags.

recipe/build.sh Outdated Show resolved Hide resolved
recipe/build.sh Outdated
Comment on lines 12 to 13
export CUDA_ARCH_LIST="sm_60;sm_80"
export CUDAARCHS="60-virtual;80-virtual"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
export CUDA_ARCH_LIST="sm_60;sm_80"
export CUDAARCHS="60-virtual;80-virtual"
export CUDA_ARCH_LIST="sm_60"
export CUDAARCHS="60-virtual"

Still only enough time for one arch on Travis.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could try cross-compiling if that is helpful

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does CUDA 11 support cross-compile? If not, I need to wait for the second migrator for arm/ppc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep it does. Several packages are already building for CUDA 11 using cross-compilation. For example nccl uses cross-compilation

It also will be supported for CUDA 12 (once work on the other archs wraps up)

@carterbox carterbox added the automerge Merge the PR when CI passes label Jun 2, 2023
@@ -0,0 +1,27 @@
BSD-3-Clause license
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting was this meant to be added?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

conda-smithy added it, and I don't oppose it, so yes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I guess it's redundant.

@jakirkham
Copy link
Member

Thanks Daniel! 🙏

This is pretty impressive work 👏

We're looking into the compiler flag issue ( conda-forge/cuda-nvcc-feedstock#18 ) (as Bradley noted over there)

@jakirkham
Copy link
Member

@conda-forge-admin, please re-render

@github-actions
Copy link
Contributor

github-actions bot commented Jun 2, 2023

Hi! This is the friendly automated conda-forge-webservice.

I tried to rerender for you, but it looks like there was nothing to do.

This message was generated by GitHub actions workflow run https://github.com/conda-forge/libmagma-feedstock/actions/runs/5156798505.

@carterbox
Copy link
Member

@jakirkham, on powerpc, the linking step fails because gcc is trying to link with the x86 cuda runtime instead of the powerpc runtime. Arm is fine though.

Compare the end of the linking line for ARM:

-L/usr/local/cuda/targets/sbsa-linux/lib/stubs   -L/usr/local/cuda/targets/sbsa-linux/lib -Wl,-rpath,/usr/local/cuda/targets/sbsa-linux/lib:  $PREFIX/lib/liblapack.so  $PREFIX/lib/libblas.so  /usr/local/cuda/targets/sbsa-linux/lib/libcudart.so  /usr/local/cuda/targets/sbsa-linux/lib/libcublas.so  /usr/local/cuda/targets/sbsa-linux/lib/libcusparse.so  /usr/local/cuda/targets/sbsa-linux/lib/libcublasLt.so  /usr/local/cuda/targets/sbsa-linux/lib/libculibos.a  -lcudadevrt  -lcudart_static  -lrt  -lpthread  -ldl && :

with powerpc

-L/usr/local/cuda/targets/ppc64le-linux/lib/stubs   -L/usr/local/cuda/targets/ppc64le-linux/lib -Wl,-rpath,/usr/local/cuda/lib64:  $PREFIX/lib/liblapack.so  $PREFIX/lib/libblas.so  /usr/local/cuda/lib64/libcudart.so  /usr/local/cuda/lib64/libcublas.so  /usr/local/cuda/lib64/libcusparse.so  /usr/local/cuda/lib64/libcublasLt.so  /usr/local/cuda/lib64/libculibos.a  -lcudadevrt  -lcudart_static  -lrt  -lpthread  -ldl && :

Paths to /usr/local/cuda are lib64 instead of targets/powerpc/lib.

https://dev.azure.com/conda-forge/feedstock-builds/_build/results?buildId=716394&view=logs&j=9a864fd9-6c8f-52ca-79ce-2aa6dca1a1de&t=10fc5aa2-324e-5982-4c88-6b31fcab16b3&l=4329

@jakirkham
Copy link
Member

Yeah CUDA 12 on other architectures needs a bit more work. Hence we paused that migrator

That said, would go ahead and file this as an issue on the cuda-nvcc feedstock and we can take a look

@carterbox
Copy link
Member

This is for CUDA 11, not CUDA 12.

@github-actions github-actions bot removed the automerge Merge the PR when CI passes label Jun 2, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Jun 2, 2023

Hi! This is the friendly conda-forge automerge bot!

Commits were made to this PR after the automerge label was added. For security reasons, I have disabled automerge by removing the automerge label. Please add the automerge label again (or ask a maintainer to do so) if you'd like to enable automerge again!

@carterbox carterbox added the automerge Merge the PR when CI passes label Jun 2, 2023
@jakirkham
Copy link
Member

Ah ok. Sorry for the confusion

Think we might want to change these lines then

If you would like to start a PR, would be happy to review 🙂

@github-actions
Copy link
Contributor

github-actions bot commented Jun 2, 2023

Hi! This is the friendly conda-forge automerge bot!

I considered the following status checks when analyzing this PR:

  • linter: passed
  • azure: passed

Thus the PR was passing and merged! Have a great day!

@jakirkham
Copy link
Member

One of the jobs timed out. Restarting to see if it was just a transient issue

@carterbox
Copy link
Member

It probably timed out because I've been playing with the number of target CUDA archs. Jobs time out when the list is too long.

@github-actions github-actions bot merged commit d2c4613 into conda-forge:main Jun 3, 2023
@jakirkham
Copy link
Member

Maybe

Think it was a Windows job fwiw

Saw a few odd things in CI recently. So it could also be some other issue

@regro-cf-autotick-bot regro-cf-autotick-bot deleted the rebuild-cuda120-0-1_ha4413d branch June 3, 2023 03:51
@jakirkham
Copy link
Member

The Windows job appears to be failing on main. Tried restarting, but it appears it failed again. Maybe it is too close to the CI time limit?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
automerge Merge the PR when CI passes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants