[BUG] FIFO read in loop attempts to access repeated data #110

zzzDavid · 2023-11-02T01:24:09Z

This is an issue I got while turning PolyBench 2mm to a dataflow design. This issue can be fixed later.

Describe the bug
There should be a loop-invariant code motion pass before turning load/store into fifo read/write operations. When a read happens in a loop that doesn't affect the load indices, turning it directly to fifo read can result in accessing the wrong data.

To Reproduce

     def mm1(A: T[P, Q], B: T[Q, R], out_AB: T[P, R]):
		for i0, j0 in allo.grid(P, R):
			for k0 in allo.reduction(Q):
				out_AB[i0, j0] += A[i0, k0] * B[k0, j0]

	def mm2(out_AB: T[P, R], C: T[R, S], out_ABC: T[P, S]):
		for i1, j1 in allo.grid(P, S):
			for k1 in allo.reduction(R):
				out_ABC[i1, j1] += out_AB[i1, k1] * C[k1, j1]

	def ele_add(out_ABC: T[P, S], D: T[P, S], output: T[P, S]):
		for i2, j2 in allo.grid(P, S):
			output[i2, j2] = out_ABC[i2, j2] * beta + D[i2, j2] * alpha

	def kernel_2mm(A: T[P, Q], B: T[Q, R], C: T[R, S], D: T[P, S]) -> T[P, S]:
		out_AB: T[P, R]
		out_ABC: T[P, S]
		output: T[P, S]
		mm1(A, B, out_AB)
		mm2(out_AB, C, out_ABC)
		ele_add(out_ABC, D, output)
		return output

	sch0 = allo.customize(mm1)
	sch0.reorder("k0", "j0")
	sch0.buffer_at(sch0.out_AB, axis="i0")
	sch0.pipeline("k0")
	
	sch1 = allo.customize(mm2)
	sch1.reorder("k1", "j1")
	sch1.buffer_at(sch1.out_ABC, axis="i1")
	sch1.pipeline("k1")
	
	sch2 = allo.customize(ele_add)
	sch2.pipeline("j2")

	sch = allo.customize(kernel_2mm)
	sch.compose(sch0, sch1, sch2)
	
	sch.to(sch.out_AB, "mm2")
	sch.to(sch.out_ABC, "ele_add")

Buggy output

HLS code for mm2:

void mm2(
  hls::stream< float > &v15 /* v15[180][190] */,
  float v16[190][220],
  hls::stream< float > &v17 /* v17[180][220] */
) {     // L26
  #pragma HLS stream variable=v15 depth=34200
  #pragma HLS stream variable=v17 depth=39600
  l_S_i1_j1_0_i1: for (int i1 = 0; i1 < 180; i1++) {    // L27
    float v19[220];     // L28
    l_j1_init: for (int j1_init = 0; j1_init < 220; j1_init++) {        // L30
    #pragma HLS pipeline II=1
      v19[j1_init] = 0.000000;  // L31
    }
    l_S_k1_0_k1: for (int k1 = 0; k1 < 190; k1++) {     // L33
    #pragma HLS pipeline II=1
      l_j1: for (int j1 = 0; j1 < 220; j1++) {  // L34
        float v23 = v15.read(); // v15[i1][k1]; // L35
        float v24 = v16[k1][j1];        // L36
        float v25 = v23 * v24;  // L37
        float v26 = v19[j1];    // L38
        float v27 = v26 + v25;  // L39
        v19[j1] = v27;  // L40
      }
    }
    l_j1_back: for (int j1_back = 0; j1_back < 220; j1_back++) {        // L43
    #pragma HLS pipeline II=1
      float v29 = v19[j1_back]; // L44
      v17.write(v29); // v17[i1][j1_back] = v29;        // L45
    }
  }
}

Expected behavior
The v15.read() should be moved outside the reduction loop.

void mm2(
  hls::stream< float > &v15 /* v15[180][190] */,
  float v16[190][220],
  hls::stream< float > &v17 /* v17[180][220] */
) {     // L26
  #pragma HLS stream variable=v15 depth=34200
  #pragma HLS stream variable=v17 depth=39600
  l_S_i1_j1_0_i1: for (int i1 = 0; i1 < 180; i1++) {    // L27
    float v19[220];     // L28
    l_j1_init: for (int j1_init = 0; j1_init < 220; j1_init++) {        // L30
    #pragma HLS pipeline II=1
      v19[j1_init] = 0.000000;  // L31
    }
    l_S_k1_0_k1: for (int k1 = 0; k1 < 190; k1++) {     // L33
    #pragma HLS pipeline II=1
        float v23 = v15.read(); // v15[i1][k1]; // L35
      l_j1: for (int j1 = 0; j1 < 220; j1++) {  // L34
        float v24 = v16[k1][j1];        // L36
        float v25 = v23 * v24;  // L37
        float v26 = v19[j1];    // L38
        float v27 = v26 + v25;  // L39
        v19[j1] = v27;  // L40
      }
    }
    l_j1_back: for (int j1_back = 0; j1_back < 220; j1_back++) {        // L43
    #pragma HLS pipeline II=1
      float v29 = v19[j1_back]; // L44
      v17.write(v29); // v17[i1][j1_back] = v29;        // L45
    }
  }
}

The text was updated successfully, but these errors were encountered:

chhzh123 · 2023-11-02T01:34:37Z

Right, I also found this issue when I cascade two systolic arrays. Currently we did not do dataflow checking. I'm not sure whether it is always safe to hoist the variable. We can change the coding style at this moment to workaround this issue -- basically create an intermediate variable for the reduction loop.

zzzDavid added the bug Something isn't working label Nov 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] FIFO read in loop attempts to access repeated data #110

[BUG] FIFO read in loop attempts to access repeated data #110

zzzDavid commented Nov 2, 2023

chhzh123 commented Nov 2, 2023

[BUG] FIFO read in loop attempts to access repeated data #110

[BUG] FIFO read in loop attempts to access repeated data #110

Comments

zzzDavid commented Nov 2, 2023

chhzh123 commented Nov 2, 2023