Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NCCL_ALGO on multi-node and multi-GPU #215

Open
MajidSalimi opened this issue May 20, 2024 · 1 comment
Open

NCCL_ALGO on multi-node and multi-GPU #215

MajidSalimi opened this issue May 20, 2024 · 1 comment

Comments

@MajidSalimi
Copy link

Hi.

I have been running NCCL_TESTS on a multi-node, multi-GPU environment with NCCL 2.19.3-1 and OpenMPI 4.1.6. Each node has 4 NVIDIA V100 GPUs interconnected with NVLink and PCIe.

  1. How is the NCCL_ALGO chosen by default, and what is the decision logic for choosing the algorithms for inter-node and intra-node communications?

  2. If I specify NCCL_ALGO=Ring and at the same time set the OMPI_MCA_coll_tuned_use_dynamic_rules=1 and set an algorithm for coll_tuned_allreduce_algorithm, how the final algorithm will be chosen? Does it go with the NCCL one or the MCA one? Or maybe one is chosen for inter-node and the other for intra-node?

@sjeaugey
Copy link
Member

  1. We have an internal model which compares the performance of the different algorithms and (hopefully) chooses the best one.
  2. You're mixing up NCCL and MPI. The OMPI_ setting controls MPI and NCCL does not use MPI (even for inter-node communication). MPI is only used by the NCCL tests to spawn tasks and help with the CPU-CPU synchronization, but it's not required by NCCL, at all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants