Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update feedstock to use cirun-openstack-gpu-large with Cirun #14

Merged
merged 11 commits into from
Jan 16, 2024

Conversation

conda-forge-admin
Copy link
Contributor

@conda-forge-admin conda-forge-admin commented Nov 29, 2023

Note that only builds triggered by maintainers of the feedstock (and core)
who have accepted the terms of service and privacy policy will run
on Github actions via Cirun.

Also, note that rerendering with Github actions as CI provider must be done
locally in the future for this feedstock.

automatic conda-forge administrator and others added 2 commits November 29, 2023 21:15
@conda-forge-webservices
Copy link
Contributor

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe) and found it was in an excellent condition.

@isuruf isuruf closed this Nov 29, 2023
@isuruf isuruf reopened this Nov 29, 2023
@carterbox
Copy link
Member

Please don't merge until I have adjusted the build scripts on all platforms to target all of the CUDA archs. This PR may remain open for a few weeks until I have time to do that.

@carterbox
Copy link
Member

I have updated the build scripts to target all of the archs, so it's OK by me to merge this PR. However, Cirun does not succeed.

{
    "error": "User not authorized for the requested runners, user's role not authorized."
}

I assume @isuruf et al are still shaking out the bugs. Please let me know if I need to do something.

@jaimergp jaimergp closed this Jan 2, 2024
@jaimergp jaimergp reopened this Jan 2, 2024
@jaimergp
Copy link
Member

jaimergp commented Jan 2, 2024

CUDA 12 failing with:

CMake Error at /home/conda/feedstock_root/build_artifacts/libmagma_1704193253113/_build_env/share/cmake-3.28/Modules/CMakeDetermineCompilerId.cmake:780 (message):
  Compiling the CUDA compiler identification source file
  "CMakeCUDACompilerId.cu" failed.

@carterbox
Copy link
Member

carterbox commented Jan 3, 2024

I'm pretty sure that the errors for CUDA 11.x are because the shared library is too large when it contains byte code for all of the archs. I remember running into this error before when I first took over building this library.

https://stackoverflow.com/a/47168086/4459405

We can either try adding -mcmodel=medium to the GCC args or reducing the number of CUDA archs. Let's try reducing the number of archs first.

@carterbox carterbox closed this Jan 3, 2024
@carterbox carterbox reopened this Jan 3, 2024
@jaimergp
Copy link
Member

jaimergp commented Jan 3, 2024

I need to take a look into the roles error, sorry 🙏

@carterbox
Copy link
Member

Because the builds succeeded with fewer archs, and the fact that I know libmagma is already around 2GB in size, I accept that as evidence that the build error was due to the symbols not fitting into the default address space size of 2GB. We can either build for fewer archs or enable the larger memory space.

IMO, the library is already quite large and the impact of building for more minor archs is probably not worth potential issues from increasing the address space.

export CUDA_ARCH_LIST="${CUDA_ARCH_LIST},sm_35"
export CUDAARCHS="${CUDAARCHS};35-virtual;80-virtual"
export CUDA_ARCH_LIST="${CUDA_ARCH_LIST},sm_35,sm_86"
export CUDAARCHS="${CUDAARCHS};35-real;86"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change? You need the virtual arch for newer GPUs to get better performance.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For CMAKE, 86 is an abbreviation of 86-real;86-virtual.

With the extra build time, the list of target archs for 11.2 has expanded from
35-virual;50-real;60-real;70-real;75-real;80-real;80-virtual
to
35-real;50-real;60-real;70-real;75-real;80-real;86-real;86-virtual

35 was converted from virtual to real. 80 virtual was dropped, but 86 was added for both virtual and real.

@carterbox
Copy link
Member

Need to revert the Windows build scripts because Cirun is not offering Windows runners.

@carterbox carterbox added the automerge Merge the PR when CI passes label Jan 12, 2024
@carterbox carterbox merged commit 0dae659 into conda-forge:main Jan 16, 2024
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
automerge Merge the PR when CI passes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants