[BACKEND] Replace `isMmaToDotShortcut` with linear layout based logic #4951

Jokeren · 2024-10-18T20:07:04Z

This PR fixes the cvtReordersRegisters method, which previously could not return true for two layouts with different numbers of registers. With this update, we can remove the legacy isMmaToDotShortcut and its associated shortcut conversion.

Additionally, we store the dot operand results in the access order to improve code clarity.

Going forward, we intend to eliminate unnecessary shortcut conversions and replace them with the use of transferWithinThread.

Jokeren · 2024-10-21T02:43:30Z

@zhanglx13 and @antiagainst you may want to take a look as well.
From this PR going on, I think we can try to remove all short cut functions, including the ones introduced by AMD.
As long as a layout has a correct and well-defined linear layout, I think cvtReordersRegisters is sufficient to determine if we can permute registers to perform a conversion without shared memory.

Jokeren · 2024-10-21T02:47:38Z

@ThomasRaoux feel free to run a regression test on the PR.

I don't think there should be any issues since I only changed the register access order, but I just wanted to catch potential problems early.

lezcano · 2024-10-21T16:22:02Z

third_party/nvidia/lib/TritonNVIDIAGPUToLLVM/UpcastMXFPToLLVM.cpp

@@ -80,19 +80,6 @@ class UpcastMXFPOpPattern : public ConvertOpToLLVMPattern<UpcastMXFPOp> {
        ret.push_back(v);
      }
    }
-    // FIXME [Dot LL]


lezcano · 2024-10-21T16:23:35Z

third_party/nvidia/lib/TritonNVIDIAGPUToLLVM/DotOpToLLVM/MMAv2.cpp

@@ -75,9 +75,39 @@ ValueTableV2 getValuesFromDotOperandLayoutStruct(

    // For kWidth = 8, split the mma into 4 mmas with "stride 4" along K
    if (dot.getOpIdx() == 0) {
-      si = llvm::SmallVector<unsigned>{0, 8,  4, 12, 1, 9,  5, 13,
-                                       2, 10, 6, 14, 3, 11, 7, 15};
+      // Original register layout:


Thank you for making the comments more explicit!

lezcano · 2024-10-21T16:26:48Z

lib/Conversion/TritonGPUToLLVM/ElementwiseOpToLLVM.cpp

      ret.push_back(values[i]);
-      ret.push_back(values[i + 1]);
+      ret.push_back(values[i + 3]);
+      ret.push_back(values[i + 2]);
      ret.push_back(values[i + 4]);
      ret.push_back(values[i + 5]);
-      ret.push_back(values[i + 2]);
-      ret.push_back(values[i + 3]);
-      ret.push_back(values[i + 6]);
      ret.push_back(values[i + 7]);
+      ret.push_back(values[i + 6]);
+      ret.push_back(values[i + 8]);


Off-by-one error: you are accessing i + 8 and not accessing i + 1. Can you write a test that exercises this path?

Yeah. Wonder why no test case captures this problem

fp16->fp32 should have been covered by 9357902 now

…en/ll-shortcut

lezcano

After thinking a bit about it, I think I understand why padding fixes the issues we were seeing when the inputs and outputs have a different number of registers.
The issue stems from the function

triton/lib/Tools/LinearLayout.cpp

Line 119 in 1064b59

getInjectiveMat(const LinearLayout &layout) {

This function makes both the matrices injective by extending their codomain. This is an issue if the inDims have different dimensions, as the codomains will now differ, which is a precondition for the Gaussian elimination to make sense!

The padding patch mitigates this in the cases we found in practice, as it so happens that padding matches perfectly all the free variables from the two matrices, so getInjectiveMat turns this transformation into the identity, which is perfect.

I don't think that this is the correct approach in general, but it's clearly an improvement over the previous state, so approving. I think I have a solution for the general problem, but I'll implement that at a later point.

Also, thank you @Jokeren for finding the adversarial examples and adding tests for them!

Jokeren and others added 28 commits October 17, 2024 19:57

Update

b58fec1

Merge branch 'main' into keren/ll-shortcut

f270126

Update

1e04af6

Update

1862d7d

Update

31671b3

Update

5fdf7b9

Update

c42b3ad

Update

07f7501

Update

546b4c3

Update

c1f994f

Update

e91829c

Update

15dfa42

Update

06bda3d

Update

cec4ca8

Update

a1e1dab

Update

8b0c687

Update

b7494d5

Update

da0f32a

Update

e0c5ece

Update

494c525

Update

ef058ac

Update

05f4453

Update

49a2c72

Update

5b88422

Update

d807ff2

Update

94fdefd

Update

c1d5176

Merge branch 'main' into keren/ll-shortcut

b6ce0a6

Jokeren marked this pull request as ready for review October 21, 2024 02:37

Jokeren requested a review from ptillet as a code owner October 21, 2024 02:37

Jokeren requested review from antiagainst, lezcano and ThomasRaoux October 21, 2024 02:40

Jokeren requested a review from zhanglx13 October 21, 2024 02:43

Jokeren mentioned this pull request Oct 21, 2024

[AMD] MFMA Dot operand to LinearLayout conversion #4961

Open

lezcano reviewed Oct 21, 2024

View reviewed changes

Jokeren added 5 commits October 21, 2024 15:32

Update

cd35bce

Merge branch 'keren/ll-shortcut' of github.com:openai/triton into ker…

9e1d950

…en/ll-shortcut

Update

e686c7a

Update

2f99b74

Update

251d656

lezcano approved these changes Oct 23, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BACKEND] Replace `isMmaToDotShortcut` with linear layout based logic #4951

[BACKEND] Replace `isMmaToDotShortcut` with linear layout based logic #4951

Jokeren commented Oct 18, 2024 •

edited

Loading

Jokeren commented Oct 21, 2024

Jokeren commented Oct 21, 2024

lezcano Oct 21, 2024

lezcano Oct 21, 2024

lezcano Oct 21, 2024

Jokeren Oct 21, 2024

Jokeren Oct 22, 2024

lezcano left a comment

[BACKEND] Replace isMmaToDotShortcut with linear layout based logic #4951

Are you sure you want to change the base?

[BACKEND] Replace isMmaToDotShortcut with linear layout based logic #4951

Conversation

Jokeren commented Oct 18, 2024 • edited Loading

Jokeren commented Oct 21, 2024

Jokeren commented Oct 21, 2024

lezcano Oct 21, 2024

Choose a reason for hiding this comment

lezcano Oct 21, 2024

Choose a reason for hiding this comment

lezcano Oct 21, 2024

Choose a reason for hiding this comment

Jokeren Oct 21, 2024

Choose a reason for hiding this comment

Jokeren Oct 22, 2024

Choose a reason for hiding this comment

lezcano left a comment

Choose a reason for hiding this comment

[BACKEND] Replace `isMmaToDotShortcut` with linear layout based logic #4951

[BACKEND] Replace `isMmaToDotShortcut` with linear layout based logic #4951

Jokeren commented Oct 18, 2024 •

edited

Loading