Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Cannot lower loops with tensors to llvm when using tensor dialect #53

Open
mmengjiadai opened this issue Aug 25, 2023 · 0 comments
Assignees
Labels
bug Something isn't working

Comments

@mmengjiadai
Copy link
Contributor

Describe the bug
When operating tensors in a For loop using tensor dialect, such as inserting value into slices, module cannot be lowered to llvm correctly. tests will report error: unknown: operand #0 does not dominate this use and the exact operation.

To Reproduce
in tests/test_linalg.py def test_math_scalar()

def kernel(A: float32[M, K], B: float32[K, N]) -> float32[M, N]:
        C: float32[M, N] = 0.0
        D: float32[M, N] = 0.0
        for i, j in allo.grid(M, N):
            for k in allo.reduction(K):
                C[i, j] += A[i, k] * B[k, j]
        for i, j in allo.grid(M, N):
            D[i, j] = (allo.exp(C[i, j]) + allo.log(C[i, j])) / C[i, j]
        return D

the module after customize() is

"builtin.module"() ({
  "func.func"() <{function_type = (tensor<10x15xf32>, tensor<15x20xf32>) -> tensor<10x20xf32>, sym_name = "kernel"}> ({
  ^bb0(%arg0: tensor<10x15xf32>, %arg1: tensor<15x20xf32>):
    %0 = "tensor.generate"() ({
    ^bb0(%arg2: index, %arg3: index):
      %2 = "arith.constant"() <{value = 0.000000e+00 : f32}> : () -> f32
      "tensor.yield"(%2) : (f32) -> ()
    }) : () -> tensor<10x20xf32>
    %1 = "tensor.generate"() ({
    ^bb0(%arg2: index, %arg3: index):
      %2 = "arith.constant"() <{value = 0.000000e+00 : f32}> : () -> f32
      "tensor.yield"(%2) : (f32) -> ()
    }) : () -> tensor<10x20xf32>
    "affine.for"() ({
    ^bb0(%arg2: index):
      "affine.for"() ({
      ^bb0(%arg3: index):
        "affine.for"() ({
        ^bb0(%arg4: index):
          %2 = "tensor.extract"(%arg0, %arg2, %arg4) : (tensor<10x15xf32>, index, index) -> f32
          %3 = "tensor.extract"(%arg1, %arg4, %arg3) : (tensor<15x20xf32>, index, index) -> f32
          %4 = "arith.mulf"(%2, %3) <{fastmath = #arith.fastmath<none>}> : (f32, f32) -> f32
          %5 = "tensor.extract"(%0, %arg2, %arg3) {from = "C"} : (tensor<10x20xf32>, index, index) -> f32
          %6 = "arith.addf"(%5, %4) <{fastmath = #arith.fastmath<none>}> : (f32, f32) -> f32
          %7 = "tensor.insert"(%6, %0, %arg2, %arg3) : (f32, tensor<10x20xf32>, index, index) -> tensor<10x20xf32>
          "affine.yield"() : () -> ()
        }) {loop_name = "k", lower_bound = #map, op_name = "S_k_0", reduction, step = 1 : i32, upper_bound = #map1} : () -> ()
        "affine.yield"() : () -> ()
      }) {loop_name = "j", lower_bound = #map, step = 1 : i32, upper_bound = #map2} : () -> ()
      "affine.yield"() : () -> ()
    }) {loop_name = "i", lower_bound = #map, op_name = "S_i_j_0", step = 1 : i32, upper_bound = #map3} : () -> ()
    "affine.for"() ({
    ^bb0(%arg2: index):
      "affine.for"() ({
      ^bb0(%arg3: index):
        %2 = "tensor.extract"(%0, %arg2, %arg3) : (tensor<10x20xf32>, index, index) -> f32
        %3 = "math.exp"(%2) <{fastmath = #arith.fastmath<none>}> : (f32) -> f32
        %4 = "tensor.extract"(%0, %arg2, %arg3) : (tensor<10x20xf32>, index, index) -> f32
        %5 = "math.log"(%4) <{fastmath = #arith.fastmath<none>}> : (f32) -> f32
        %6 = "arith.addf"(%3, %5) <{fastmath = #arith.fastmath<none>}> : (f32, f32) -> f32
        %7 = "tensor.extract"(%0, %arg2, %arg3) : (tensor<10x20xf32>, index, index) -> f32
        %8 = "arith.divf"(%6, %7) <{fastmath = #arith.fastmath<none>}> : (f32, f32) -> f32
        %9 = "tensor.insert"(%8, %1, %arg2, %arg3) : (f32, tensor<10x20xf32>, index, index) -> tensor<10x20xf32>
        "affine.yield"() : () -> ()
      }) {loop_name = "j", lower_bound = #map, step = 1 : i32, upper_bound = #map2} : () -> ()
      "affine.yield"() : () -> ()
    }) {loop_name = "i", lower_bound = #map, op_name = "S_i_j_2", step = 1 : i32, upper_bound = #map3} : () -> ()
    "func.return"(%9) : (tensor<10x20xf32>) -> ()
  }) {itypes = "__", otypes = "_"} : () -> ()
}) : () -> ()

Buggy output

File "/home/md2249/allo/allo/passes.py", line 25, in _mlir_lower_pipeline
  mlir_pass_manager.parse(pipeline).run(module.operation)
hcl_mlir._mlir_libs._site_initialize.<locals>.MLIRError: Failure while executing pass pipeline:
error: unknown: operand #0 does not dominate this use
note: unknown: see current operation: "func.return"(%9) : (tensor<10x20xf32>) -> ()
note: unknown: operand defined here (op in a child region)

Expected behavior
Already used pass in allo/backend/llvm.py class LLVMModule

pm = PassManager.parse(
                "builtin.module("
                # used for lowering tensor.empty
                "empty-tensor-to-alloc-tensor,"
                # translate tensor dialect (virtual) to memref dialect (physical)
                "one-shot-bufferize{allow-return-allocs bufferize-function-boundaries},"
                # used for lowering memref.subview
                "expand-strided-metadata,"
                # common lowering passes
                "func.func(convert-linalg-to-affine-loops),lower-affine"
                ")"
            )

Hope to lower loop cases to llvm without reporting error

Additional context
Add any other context about the problem here.

@mmengjiadai mmengjiadai added the bug Something isn't working label Aug 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants