You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
And iS1, iS3 and iS5 are all inlined together. Since T3 is reordered, bS2 and bS4 loops need to be placed outer and inner of the iS1/iS3/iS5 loop, respectively. In other words, we would need to create loops as:
for i in bS2:
for j in iS1
for k in bS4
In fact, this is the Kernel IR just after the LoopNestGenerator pass:
The problematic expression is the one producing T3. Since all broadcast domains of this fusion are exactly mapped, bS2 and bS4 are indeed self-mapped, resulting in the original error:
C++ exception with description "!concrete_to_loop.count(concrete_loop_id) INTERNAL ASSERT FAILED at "csrc/device_lower/analysis/index_compute.cpp":5
97, please report a bug with repro script to NVFuser at https://github.com/NVIDIA/Fuser/issues. Unsupported loop structure. Two loops are mapped together.bS4{1} and bS2{1}
Exception raised from validateLoopStructure at csrc/device_lower/analysis/index_compute.cpp:597 (most recent call first):
I think this validation error is a false alarm since they are just broadcast domains. There's no broadcast forwarding in this fusion, so they should be just no-op for indexing. However, just disabling the validation resulted in another validation error:
C++ exception with description "loops.size() <= loop_domains.size() INTERNAL ASSERT FAILED at "csrc/device_lower/analysis/index_compute.cpp":215, please report a bug with repro script to NVFuser at https://github.com/NVIDIA/Fuser/issues. Loop domain didn't replay all loops
Exception raised from getNonGlobalInitialIndexParameters at csrc/device_lower/analysis/index_compute.cpp:215 (most recent call first):
frame #0: <unknown function> + 0xcf9893 (0x558eac519893 in ./bin/nvfuser_tests)
Not unexpectedly, there seems to be some code relying on the self-mapping-free property even for broadcast domains.
Note that this fusion seems to work with the new indexer:
Originally seen in #2685.
tl;dr:
This fusion fails at indexing:
What matters most is:
And
iS1
,iS3
andiS5
are all inlined together. SinceT3
is reordered,bS2
andbS4
loops need to be placed outer and inner of theiS1/iS3/iS5
loop, respectively. In other words, we would need to create loops as:In fact, this is the Kernel IR just after the
LoopNestGenerator
pass:The problematic expression is the one producing
T3
. Since all broadcast domains of this fusion are exactly mapped,bS2
andbS4
are indeed self-mapped, resulting in the original error:I think this validation error is a false alarm since they are just broadcast domains. There's no broadcast forwarding in this fusion, so they should be just no-op for indexing. However, just disabling the validation resulted in another validation error:
Not unexpectedly, there seems to be some code relying on the self-mapping-free property even for broadcast domains.
Note that this fusion seems to work with the new indexer:
This is expected since in the new indexer, the broadcast domains are not going to participate in indexing.
Since it should work with the new indexer, I don't think it's worthwhile to fix the issue with the legacy indexer.
The text was updated successfully, but these errors were encountered: