[BUG] Cannot lower loops with tensors to llvm when using tensor dialect #53

mmengjiadai · 2023-08-25T18:22:17Z

Describe the bug
When operating tensors in a For loop using tensor dialect, such as inserting value into slices, module cannot be lowered to llvm correctly. tests will report error: unknown: operand #0 does not dominate this use and the exact operation.

To Reproduce
in tests/test_linalg.py def test_math_scalar()

def kernel(A: float32[M, K], B: float32[K, N]) -> float32[M, N]:
        C: float32[M, N] = 0.0
        D: float32[M, N] = 0.0
        for i, j in allo.grid(M, N):
            for k in allo.reduction(K):
                C[i, j] += A[i, k] * B[k, j]
        for i, j in allo.grid(M, N):
            D[i, j] = (allo.exp(C[i, j]) + allo.log(C[i, j])) / C[i, j]
        return D

the module after customize() is

"builtin.module"() ({
  "func.func"() <{function_type = (tensor<10x15xf32>, tensor<15x20xf32>) -> tensor<10x20xf32>, sym_name = "kernel"}> ({
  ^bb0(%arg0: tensor<10x15xf32>, %arg1: tensor<15x20xf32>):
    %0 = "tensor.generate"() ({
    ^bb0(%arg2: index, %arg3: index):
      %2 = "arith.constant"() <{value = 0.000000e+00 : f32}> : () -> f32
      "tensor.yield"(%2) : (f32) -> ()
    }) : () -> tensor<10x20xf32>
    %1 = "tensor.generate"() ({
    ^bb0(%arg2: index, %arg3: index):
      %2 = "arith.constant"() <{value = 0.000000e+00 : f32}> : () -> f32
      "tensor.yield"(%2) : (f32) -> ()
    }) : () -> tensor<10x20xf32>
    "affine.for"() ({
    ^bb0(%arg2: index):
      "affine.for"() ({
      ^bb0(%arg3: index):
        "affine.for"() ({
        ^bb0(%arg4: index):
          %2 = "tensor.extract"(%arg0, %arg2, %arg4) : (tensor<10x15xf32>, index, index) -> f32
          %3 = "tensor.extract"(%arg1, %arg4, %arg3) : (tensor<15x20xf32>, index, index) -> f32
          %4 = "arith.mulf"(%2, %3) <{fastmath = #arith.fastmath<none>}> : (f32, f32) -> f32
          %5 = "tensor.extract"(%0, %arg2, %arg3) {from = "C"} : (tensor<10x20xf32>, index, index) -> f32
          %6 = "arith.addf"(%5, %4) <{fastmath = #arith.fastmath<none>}> : (f32, f32) -> f32
          %7 = "tensor.insert"(%6, %0, %arg2, %arg3) : (f32, tensor<10x20xf32>, index, index) -> tensor<10x20xf32>
          "affine.yield"() : () -> ()
        }) {loop_name = "k", lower_bound = #map, op_name = "S_k_0", reduction, step = 1 : i32, upper_bound = #map1} : () -> ()
        "affine.yield"() : () -> ()
      }) {loop_name = "j", lower_bound = #map, step = 1 : i32, upper_bound = #map2} : () -> ()
      "affine.yield"() : () -> ()
    }) {loop_name = "i", lower_bound = #map, op_name = "S_i_j_0", step = 1 : i32, upper_bound = #map3} : () -> ()
    "affine.for"() ({
    ^bb0(%arg2: index):
      "affine.for"() ({
      ^bb0(%arg3: index):
        %2 = "tensor.extract"(%0, %arg2, %arg3) : (tensor<10x20xf32>, index, index) -> f32
        %3 = "math.exp"(%2) <{fastmath = #arith.fastmath<none>}> : (f32) -> f32
        %4 = "tensor.extract"(%0, %arg2, %arg3) : (tensor<10x20xf32>, index, index) -> f32
        %5 = "math.log"(%4) <{fastmath = #arith.fastmath<none>}> : (f32) -> f32
        %6 = "arith.addf"(%3, %5) <{fastmath = #arith.fastmath<none>}> : (f32, f32) -> f32
        %7 = "tensor.extract"(%0, %arg2, %arg3) : (tensor<10x20xf32>, index, index) -> f32
        %8 = "arith.divf"(%6, %7) <{fastmath = #arith.fastmath<none>}> : (f32, f32) -> f32
        %9 = "tensor.insert"(%8, %1, %arg2, %arg3) : (f32, tensor<10x20xf32>, index, index) -> tensor<10x20xf32>
        "affine.yield"() : () -> ()
      }) {loop_name = "j", lower_bound = #map, step = 1 : i32, upper_bound = #map2} : () -> ()
      "affine.yield"() : () -> ()
    }) {loop_name = "i", lower_bound = #map, op_name = "S_i_j_2", step = 1 : i32, upper_bound = #map3} : () -> ()
    "func.return"(%9) : (tensor<10x20xf32>) -> ()
  }) {itypes = "__", otypes = "_"} : () -> ()
}) : () -> ()

Buggy output

File "/home/md2249/allo/allo/passes.py", line 25, in _mlir_lower_pipeline
  mlir_pass_manager.parse(pipeline).run(module.operation)
hcl_mlir._mlir_libs._site_initialize.<locals>.MLIRError: Failure while executing pass pipeline:
error: unknown: operand #0 does not dominate this use
note: unknown: see current operation: "func.return"(%9) : (tensor<10x20xf32>) -> ()
note: unknown: operand defined here (op in a child region)

Expected behavior
Already used pass in allo/backend/llvm.py class LLVMModule

pm = PassManager.parse(
                "builtin.module("
                # used for lowering tensor.empty
                "empty-tensor-to-alloc-tensor,"
                # translate tensor dialect (virtual) to memref dialect (physical)
                "one-shot-bufferize{allow-return-allocs bufferize-function-boundaries},"
                # used for lowering memref.subview
                "expand-strided-metadata,"
                # common lowering passes
                "func.func(convert-linalg-to-affine-loops),lower-affine"
                ")"
            )

Hope to lower loop cases to llvm without reporting error

Additional context
Add any other context about the problem here.

The text was updated successfully, but these errors were encountered:

mmengjiadai added the bug Something isn't working label Aug 25, 2023

mmengjiadai assigned chhzh123 Aug 25, 2023

chhzh123 mentioned this issue Aug 25, 2023

[IR] Fix tensor+linalg allocs and returns #54

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Cannot lower loops with tensors to llvm when using tensor dialect #53

[BUG] Cannot lower loops with tensors to llvm when using tensor dialect #53

mmengjiadai commented Aug 25, 2023

[BUG] Cannot lower loops with tensors to llvm when using tensor dialect #53

[BUG] Cannot lower loops with tensors to llvm when using tensor dialect #53

Comments

mmengjiadai commented Aug 25, 2023