Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why NCCL LL128 proto need to load data twice? #1391

Open
MARD1NO opened this issue Aug 6, 2024 · 0 comments
Open

Why NCCL LL128 proto need to load data twice? #1391

MARD1NO opened this issue Aug 6, 2024 · 0 comments

Comments

@MARD1NO
Copy link

MARD1NO commented Aug 6, 2024

I notice the code below in prims_ll128.h

do {
          needReload = false;
          #pragma unroll
          for (int u=0; u<ELEMS_PER_THREAD; u+=2) {
            load128(ptr+u*WARP_SIZE, vr[u], vr[u+1]);
            needReload |= flagThread && (vr[u+1] != flag);
          }
          needReload &= (0 == checkAbort(spins, i, 0));
        } while (__any_sync(WARP_MASK, needReload));

#pragma unroll
for (int u=0; u<ELEMS_PER_THREAD; u+=2)
  load128(ptr+u*WARP_SIZE, vr[u], vr[u+1]);

I think when the first do while loop finish, the register is loaded remote data successfully, so why we need to load again?... 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant