Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bump NCCL floor to 2.18.1.1 #2443

Merged
merged 4 commits into from
Sep 26, 2024
Merged

Conversation

jameslamb
Copy link
Member

Description

Contributes to rapidsai/build-planning#102

Some RAPIDS libraries are using ncclCommSplit(), which was introduced in nccl==2.18.1.1. This is part of a series of PRs across RAPIDS updating libraries' pins to nccl>=2.18.1.1 to ensure they get a new-enough version that supports that.

@jameslamb jameslamb added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change 2 - In Progress Currenty a work in progress labels Sep 20, 2024
@jameslamb jameslamb changed the title WIP: bump nccl floor to 2.18.1.1 WIP: bump NCCL floor to 2.18.1.1 Sep 20, 2024
@jameslamb jameslamb changed the title WIP: bump NCCL floor to 2.18.1.1 bump NCCL floor to 2.18.1.1 Sep 20, 2024
@jameslamb jameslamb added 3 - Ready for Review and removed 2 - In Progress Currenty a work in progress labels Sep 20, 2024
@jameslamb jameslamb marked this pull request as ready for review September 20, 2024 21:57
@jameslamb jameslamb requested a review from a team as a code owner September 20, 2024 21:57
@jameslamb
Copy link
Member Author

I've seen the C++ tests here fail consistently in the same way here, on other PRs, and on nightlies. Opened #2450 to document it. @cjnolet could you look into that or tag in someone who could?

@jameslamb
Copy link
Member Author

Updated to pull in #2453, which temporarily skips one failing test that was blocking this.

@jakirkham
Copy link
Member

When you have a chance James, could you please let me know what you think on this question ( rapidsai/build-planning#102 (comment) )?

Asking as the answer may mean adjusting the NCCL pin

@rapids-bot rapids-bot bot merged commit f37c41c into rapidsai:branch-24.10 Sep 26, 2024
69 checks passed
@jameslamb jameslamb deleted the update-nccl branch September 26, 2024 15:08
@jameslamb
Copy link
Member Author

Oh interesting that the bot merged this without a /merge comment. Someone must have clicked the automerge button in the UI (it wasn't me).

Anyway, I would have supported that anyway... as described in rapidsai/build-planning#102 and the PR description, raft is using NCCL APIs first introduced in NCCL 2.18.1 so it absolutely should have a floor of at least that version.

Now that this is merged, I'll answer the question over on rapidsai/build-planning#102 @jakirkham

@raydouglass
Copy link
Member

Oh interesting that the bot merged this without a /merge comment. Someone must have clicked the automerge button in the UI (it wasn't me).

A user commented /merge then deleted their comment trying to prevent the merge. Unfortunately, the bot was faster and merged the PR.

@jameslamb
Copy link
Member Author

ahhh thanks @raydouglass . All good!

rapids-bot bot pushed a commit that referenced this pull request Sep 26, 2024
Follow-up to #2443

As part of the work to support NumPy 2 across RAPIDS, we found reason to upgrade some libraries like `cugraph` to slightly newer NCCL (`>=2.19`). Context: rapidsai/build-planning#102 (comment)

This applies that same bump here, to keep the range of NCCL versions consistent across RAPIDS.

Authors:
  - James Lamb (https://github.com/jameslamb)

Approvers:
  - https://github.com/jakirkham
  - Corey J. Nolet (https://github.com/cjnolet)

URL: #2458
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review improvement Improvement / enhancement to an existing function non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants