-
Notifications
You must be signed in to change notification settings - Fork 770
Issues: NVIDIA/nccl
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
Why different shape of tensor can be all reduced when using nccl as backend?
#1394
opened Aug 6, 2024 by
yjzhong89
Issues with Limited HCA Utilization and RDMA in Multi-node Training
#1392
opened Aug 6, 2024 by
asdfry
Will ncclSend, ncclRecv launched in different cuda streams blocking each other?
#1389
opened Aug 5, 2024 by
billwu01
RuntimeError: NCCL error: internal error - please report this issue to the NCCL developers
#1388
opened Aug 5, 2024 by
emmanuelrajapandian
Is there any option to use copy engine in ncclSend and ncclRecv ?
#1386
opened Aug 1, 2024 by
umiswing
Some questions about how NCCL uses IB network for data transmission
#1384
opened Aug 1, 2024 by
clearsky07
NCCL with WARN socketTryAccept: Accept failed: Bad file descriptor
#1382
opened Jul 31, 2024 by
syyxsxx
torch.distributed.DistBackendError: NCCL error in: ../torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:1691, unhandled system error (run with NCCL_DEBUG=INFO for details), NCCL version 2.19.3 ncclSystemError: System call (e.g. socket, malloc) or external library call failed or device error. Last error:
#1381
opened Jul 30, 2024 by
1moye
Low bandwidth of AllReduce over long-range connection with high latancy (0.25ms)
#1378
opened Jul 29, 2024 by
yanminjia
Use nsight system to profile nccl p2p, I find something confused ...
#1369
opened Jul 20, 2024 by
oliverYoung2001
Previous Next
ProTip!
no:milestone will show everything without a milestone.