Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] FIFO read in loop attempts to access repeated data #110

Open
zzzDavid opened this issue Nov 2, 2023 · 1 comment
Open

[BUG] FIFO read in loop attempts to access repeated data #110

zzzDavid opened this issue Nov 2, 2023 · 1 comment
Labels
bug Something isn't working

Comments

@zzzDavid
Copy link
Contributor

zzzDavid commented Nov 2, 2023

This is an issue I got while turning PolyBench 2mm to a dataflow design. This issue can be fixed later.

Describe the bug
There should be a loop-invariant code motion pass before turning load/store into fifo read/write operations. When a read happens in a loop that doesn't affect the load indices, turning it directly to fifo read can result in accessing the wrong data.

To Reproduce

     def mm1(A: T[P, Q], B: T[Q, R], out_AB: T[P, R]):
		for i0, j0 in allo.grid(P, R):
			for k0 in allo.reduction(Q):
				out_AB[i0, j0] += A[i0, k0] * B[k0, j0]

	def mm2(out_AB: T[P, R], C: T[R, S], out_ABC: T[P, S]):
		for i1, j1 in allo.grid(P, S):
			for k1 in allo.reduction(R):
				out_ABC[i1, j1] += out_AB[i1, k1] * C[k1, j1]

	def ele_add(out_ABC: T[P, S], D: T[P, S], output: T[P, S]):
		for i2, j2 in allo.grid(P, S):
			output[i2, j2] = out_ABC[i2, j2] * beta + D[i2, j2] * alpha

	def kernel_2mm(A: T[P, Q], B: T[Q, R], C: T[R, S], D: T[P, S]) -> T[P, S]:
		out_AB: T[P, R]
		out_ABC: T[P, S]
		output: T[P, S]
		mm1(A, B, out_AB)
		mm2(out_AB, C, out_ABC)
		ele_add(out_ABC, D, output)
		return output

	sch0 = allo.customize(mm1)
	sch0.reorder("k0", "j0")
	sch0.buffer_at(sch0.out_AB, axis="i0")
	sch0.pipeline("k0")
	
	sch1 = allo.customize(mm2)
	sch1.reorder("k1", "j1")
	sch1.buffer_at(sch1.out_ABC, axis="i1")
	sch1.pipeline("k1")
	
	sch2 = allo.customize(ele_add)
	sch2.pipeline("j2")

	sch = allo.customize(kernel_2mm)
	sch.compose(sch0, sch1, sch2)
	
	sch.to(sch.out_AB, "mm2")
	sch.to(sch.out_ABC, "ele_add")

Buggy output

HLS code for mm2:

void mm2(
  hls::stream< float > &v15 /* v15[180][190] */,
  float v16[190][220],
  hls::stream< float > &v17 /* v17[180][220] */
) {     // L26
  #pragma HLS stream variable=v15 depth=34200
  #pragma HLS stream variable=v17 depth=39600
  l_S_i1_j1_0_i1: for (int i1 = 0; i1 < 180; i1++) {    // L27
    float v19[220];     // L28
    l_j1_init: for (int j1_init = 0; j1_init < 220; j1_init++) {        // L30
    #pragma HLS pipeline II=1
      v19[j1_init] = 0.000000;  // L31
    }
    l_S_k1_0_k1: for (int k1 = 0; k1 < 190; k1++) {     // L33
    #pragma HLS pipeline II=1
      l_j1: for (int j1 = 0; j1 < 220; j1++) {  // L34
        float v23 = v15.read(); // v15[i1][k1]; // L35
        float v24 = v16[k1][j1];        // L36
        float v25 = v23 * v24;  // L37
        float v26 = v19[j1];    // L38
        float v27 = v26 + v25;  // L39
        v19[j1] = v27;  // L40
      }
    }
    l_j1_back: for (int j1_back = 0; j1_back < 220; j1_back++) {        // L43
    #pragma HLS pipeline II=1
      float v29 = v19[j1_back]; // L44
      v17.write(v29); // v17[i1][j1_back] = v29;        // L45
    }
  }
}

Expected behavior
The v15.read() should be moved outside the reduction loop.

void mm2(
  hls::stream< float > &v15 /* v15[180][190] */,
  float v16[190][220],
  hls::stream< float > &v17 /* v17[180][220] */
) {     // L26
  #pragma HLS stream variable=v15 depth=34200
  #pragma HLS stream variable=v17 depth=39600
  l_S_i1_j1_0_i1: for (int i1 = 0; i1 < 180; i1++) {    // L27
    float v19[220];     // L28
    l_j1_init: for (int j1_init = 0; j1_init < 220; j1_init++) {        // L30
    #pragma HLS pipeline II=1
      v19[j1_init] = 0.000000;  // L31
    }
    l_S_k1_0_k1: for (int k1 = 0; k1 < 190; k1++) {     // L33
    #pragma HLS pipeline II=1
        float v23 = v15.read(); // v15[i1][k1]; // L35
      l_j1: for (int j1 = 0; j1 < 220; j1++) {  // L34
        float v24 = v16[k1][j1];        // L36
        float v25 = v23 * v24;  // L37
        float v26 = v19[j1];    // L38
        float v27 = v26 + v25;  // L39
        v19[j1] = v27;  // L40
      }
    }
    l_j1_back: for (int j1_back = 0; j1_back < 220; j1_back++) {        // L43
    #pragma HLS pipeline II=1
      float v29 = v19[j1_back]; // L44
      v17.write(v29); // v17[i1][j1_back] = v29;        // L45
    }
  }
}
@zzzDavid zzzDavid added the bug Something isn't working label Nov 2, 2023
@chhzh123
Copy link
Member

chhzh123 commented Nov 2, 2023

Right, I also found this issue when I cascade two systolic arrays. Currently we did not do dataflow checking. I'm not sure whether it is always safe to hoist the variable. We can change the coding style at this moment to workaround this issue -- basically create an intermediate variable for the reduction loop.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants