Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FPU fflags no being asserted correctly #785

Closed
mikaelsky opened this issue Feb 1, 2024 · 15 comments
Closed

FPU fflags no being asserted correctly #785

mikaelsky opened this issue Feb 1, 2024 · 15 comments

Comments

@mikaelsky
Copy link
Collaborator

Describe the bug
This is a known issue that I finally have gotten around to. The debugging is far from complete, so would suggest I add onto this this issue with examples and fixes before putting in a PR. Alternative I can toss a PR for each, but might be a lot of back-n-forth.

bug #1: When an f.add is being issued and one of the operands is +/- 0.0 and the other operands exponent is large enough to trigger the exception path the FPU will erroneously set the NX flag. This bug was caused by an earlier bug fix from me that prevent the addsub from running forever for large exponent differences.

Info (IDV) Instruction executed prior to mismatch '0x304(inst_18+10): 018df8d3 fadd.s  x17,x27,x24'
Error (IDV) CSR register value mismatch (HartId:0, PC:0x00000304 inst_18+10):
Error (IDV) Mismatch 0> CSR 001 (fflags)
Error (IDV)   . dut:0x00000001 NV:0 DZ:0 OF:0 UF:0 NX:1
Error (IDV)   . ref:0x00000000 NV:0 DZ:0 OF:0 UF:0 NX:0
Error (IDV) Mismatch 1> CSR 003 (fcsr)
Error (IDV)   . dut:0x00000001 NV:0 DZ:0 OF:0 UF:0 NX:1 FRM:0(RNE)
Error (IDV)   . ref:0x00000000 NV:0 DZ:0 OF:0 UF:0 NX:0 FRM:0(RNE) (not updated)
Error (IDV) tb.riscv_tb.idv_trace2api.state_compare @ 37380.00 ns: MISMATCH

To Reproduce
This is the instruction test from our converted F -> Zfinx riscof-arch test.
Note gcc's asm out doesn't reproduce zfinx disassembly correctly, so the above Imperas DV output for the real assembly command.
Note when you see "sky_" in front of a riscof-arch test function name is calling an updated test function that replaces F# registers with X# registers in a manner that is compliant with riscof-arch.

000002f4 <inst_18>:
// rs1==f27, rs2==f24, rd==f17,fs1 == 0 and fe1 == 0x00 and fm1 == 0x000000 and fs2 == 1 and fe2 == 0x7f and fm2 == 0x000000 and  fcsr == 0x0 and rm_val == 7   
/* opcode: fadd.s ; op1:f27; op2:f24; dest:f17; op1val:0x0; op2val:0xbf800000; 
   valaddr_reg:x3; val_offset:36*FLEN/8; rmval:dyn; fcsr: 0;
   correctval:??; testreg:x2 
*/
SKY_TEST_FPRR_OP(fadd.s, f17, f27, f24, x17, x27, x24, dyn, 0, 0, x3, 36*FLEN/8, x4, x1, x2)
     2f4:   0901ad83            lw  s11,144(gp)
     2f8:   0941ac03            lw  s8,148(gp)
     2fc:   00000113            li  sp,0
     300:   00311073            fscsr   sp
     304:   018df8d3            fadd.s  fa7,fs11,fs8
     308:   00302273            frcsr   tp
     30c:   0910a823            sw  a7,144(ra)
     310:   0840aa23            sw  tp,148(ra)

Expected behavior
For bug #1 the core is not supposed to signal NX in the fflags.

The fix is to extend the addsub large exponent difference exception path with a sreg_man = 0 exception detector

          -- set s_ext to 1 as it will always be 1 from the implied 1 being shifted out.
          addsub.man_s_ext <= '1';
          -- if man_sreg is 0 set s_ext to 0 as there is no 1 that will be shifted out.
          if (to_integer(unsigned(addsub.man_sreg)) = 0) then
            addsub.man_s_ext <= '0';
          end if;

Screenshots
NA

Environment:

  • OS : RHEL7
  • GCC Version 13.2
  • Libraries used newlib

Hardware:

  • Hardware version (1.9.3.0) with bug patches applied

Additional context
I suggest I keep extending this thread with additional bugs detected/fixes until I get a more consistent pass of our set of rv32i_m_Zfinx_XXX tests.

@mikaelsky
Copy link
Collaborator Author

Additional bugs found are 2 to 7. The cause is described below, will add a comment per bug with the recreation steps.
For all bugs I have a fix internally that passes initial testing (fadd only right now). Some bugs are partial fixes as I'm a little uncertain of the full fix.

Bug #2:
When 1.0 is added together with 1.0 + LSB the in-exact flag is not set even though we truncate away a bit.

Bug #3:
When we end up shifting 1 of the values fully out we miss that sreg.ext_s is set which will result in an inexact result

Bug #4:
Underflow does not also flag inexact, per IEE '754'
"As is defined in '754, under default exception handling, underflow is only signaled when the result is tiny and inexact. In such a case, both the underflow and inexact flags are raised."

Bug #5:
As is defined in '754, under default exception handling, overflow is only signaled when the result is large and inexact. In such a case, both the overflow and inexact flags are raised.

Bug #6:
Adding up positive infinities
"If the result of a computation is too large to represent in IEEE-754, it is given a special infinity value, which may be positive or negative. Infinity can be produced by a division by zero or an exponent overflow. An infinity value may be used in further calculations, al tho the result can only be either infinity or NaN. For example, adding a number to positive infinity results in positive infinity, and similarly for other operations with numbers. Adding two infinities produces infinity. However, if infinity is subtracted from or divided by infinity, the result is unknown and results in a NaN value. "

Bug #7:
-1.0 + 1.0 results in -0.0 and not +0.0
" 6.3 The sign bit
When the sum of two operands with opposite signs (or the difference of two operands with like signs) is exactly zero, the sign of that sum (or difference) shall be +0 in all rounding-direction attributes except roundTowardNegative; under that attribute, the sign of an exact zero sum (or difference) shall be −0.
However, x + x = x − (−x) retains the same sign as x even when x is zero."

Reference source:
Good for understanding the different flags
https://docs.oracle.com/cd/E19957-01/806-3568/ncg_handle.html
image

Some reasonable description of further "fun with float flags" stuff.
https://github.com/riscv/riscv-bfloat16/blob/main/doc/riscv-bfloat16-format.adoc

IEEE 754 float reference for dealing with the sign bit and 0.0s
https://stackoverflow.com/questions/28949774/what-is-0-0-by-ieee-floating-point-standard

@mikaelsky
Copy link
Collaborator Author

Bug #2: details

UVM_INFO @ 1.00 ns: reporter [riscv_tb] rvviMISA = 0x40801100
Info (IDV) Instruction executed prior to mismatch '0xd64(inst_101+10): 01df7fd3 fadd.s  x31,x30,x29'
Error (IDV) CSR register value mismatch (HartId:0, PC:0x00000d64 inst_101+10):
Error (IDV) Mismatch 0> CSR 001 (fflags)
Error (IDV)   . dut:0x00000000 NV:0 DZ:0 OF:0 UF:0 NX:0
Error (IDV)   . ref:0x00000001 NV:0 DZ:0 OF:0 UF:0 NX:1
Error (IDV) Mismatch 1> CSR 003 (fcsr)
Error (IDV)   . dut:0x00000000 NV:0 DZ:0 OF:0 UF:0 NX:0 FRM:0(RNE)
Error (IDV)   . ref:0x00000001 NV:0 DZ:0 OF:0 UF:0 NX:1 FRM:0(RNE)
Error (IDV) tb.riscv_tb.idv_trace2api.state_compare @ 183220.00 ns: MISMATCH

00000d54 <inst_101>:
// fs1 == 0 and fe1 == 0x01 and fm1 == 0x000000 and fs2 == 0 and fe2 == 0x01 and fm2 == 0x000001 and  fcsr == 0x0 and rm_val == 7   
/* opcode: fadd.s ; op1:f30; op2:f29; dest:f31; op1val:0x800000; op2val:0x800001; 
   valaddr_reg:x3; val_offset:202*FLEN/8; rmval:dyn; fcsr: 0;
   correctval:??; testreg:x2 
*/
SKY_TEST_FPRR_OP(fadd.s, f31, f30, f29, x31, x30, x29, dyn, 0, 0, x3, 202*FLEN/8, x4, x1, x2)
     d54:   3281af03            lw  t5,808(gp)
     d58:   32c1ae83            lw  t4,812(gp)
     d5c:   00000113            li  sp,0
     d60:   00311073            fscsr   sp
     d64:   01df7fd3            fadd.s  ft11,ft10,ft9
     d68:   00302273            frcsr   tp
     d6c:   33f0a423            sw  t6,808(ra)
     d70:   3240a623            sw  tp,812(ra)

Code fix in the normalizer S_ROUND state is to set in-exact if we end up shifting out a bit

          sreg.lower <= round.output(22 downto 00);
          -- If after the first shift we get a bit in any of the guard bits then independent of rounding mode
          -- the end result will be inexact as we are truncating away information
          ctrl.flags(fp_exc_nx_c) <= sreg.ext_g or sreg.ext_r or sreg.ext_s;

@mikaelsky
Copy link
Collaborator Author

Bug #3:

Info (IDV) Instruction executed prior to mismatch '0xd84(inst_102+10): 01df7fd3 fadd.s  x31,x30,x29'
Error (IDV) CSR register value mismatch (HartId:0, PC:0x00000d84 inst_102+10):
Error (IDV) Mismatch 0> CSR 001 (fflags)
Error (IDV)   . dut:0x00000000 NV:0 DZ:0 OF:0 UF:0 NX:0
Error (IDV)   . ref:0x00000001 NV:0 DZ:0 OF:0 UF:0 NX:1
Error (IDV) Mismatch 1> CSR 003 (fcsr)
Error (IDV)   . dut:0x00000000 NV:0 DZ:0 OF:0 UF:0 NX:0 FRM:0(RNE)
Error (IDV)   . ref:0x00000001 NV:0 DZ:0 OF:0 UF:0 NX:1 FRM:0(RNE)
Error (IDV) tb.riscv_tb.idv_trace2api.state_compare @ 185140.00 ns: MISMATCH
/* opcode: fadd.s ; op1:f30; op2:f29; dest:f31; op1val:0x800000; op2val:0x3f800000; 
   valaddr_reg:x3; val_offset:204*FLEN/8; rmval:dyn; fcsr: 0;
   correctval:??; testreg:x2 
*/
SKY_TEST_FPRR_OP(fadd.s, f31, f30, f29, x31, x30, x29, dyn, 0, 0, x3, 204*FLEN/8, x4, x1, x2)
     d74:   3301af03            lw  t5,816(gp)
     d78:   3341ae83            lw  t4,820(gp)
     d7c:   00000113            li  sp,0
     d80:   00311073            fscsr   sp
     d84:   01df7fd3            fadd.s  ft11,ft10,ft9
     d88:   00302273            frcsr   tp
     d8c:   33f0a823            sw  t6,816(ra)
     d90:   3240aa23            sw  tp,820(ra)

Same fix as Bug #2.

@mikaelsky
Copy link
Collaborator Author

Bug #4:

Error (IDV) CSR register value mismatch (HartId:0, PC:0x00000ee4 inst_113+10):
Error (IDV) Mismatch 0> CSR 001 (fflags)
Error (IDV)   . dut:0x00000002 NV:0 DZ:0 OF:0 UF:1 NX:0
Error (IDV)   . ref:0x00000003 NV:0 DZ:0 OF:0 UF:1 NX:1
Error (IDV) Mismatch 1> CSR 003 (fcsr)
Error (IDV)   . dut:0x00000002 NV:0 DZ:0 OF:0 UF:1 NX:0 FRM:0(RNE)
Error (IDV)   . ref:0x00000003 NV:0 DZ:0 OF:0 UF:1 NX:1 FRM:0(RNE)
Error (IDV) tb.riscv_tb.idv_trace2api.state_compare @ 205100.00 ns: MISMATCH
00000ed4 <inst_113>:
// fs1 == 0 and fe1 == 0x01 and fm1 == 0x000000 and fs2 == 1 and fe2 == 0x01 and fm2 == 0x055555 and  fcsr == 0x0 and rm_val == 7   
/* opcode: fadd.s ; op1:f30; op2:f29; dest:f31; op1val:0x800000; op2val:0x80855555; 
   valaddr_reg:x3; val_offset:226*FLEN/8; rmval:dyn; fcsr: 0;
   correctval:??; testreg:x2 
*/
SKY_TEST_FPRR_OP(fadd.s, f31, f30, f29, x31, x30, x29, dyn, 0, 0, x3, 226*FLEN/8, x4, x1, x2)
     ed4:   3881af03            lw  t5,904(gp)
     ed8:   38c1ae83            lw  t4,908(gp)
     edc:   00000113            li  sp,0
     ee0:   00311073            fscsr   sp
     ee4:   01df7fd3            fadd.s  ft11,ft10,ft9
     ee8:   00302273            frcsr   tp
     eec:   39f0a423            sw  t6,904(ra)
     ef0:   3840a623            sw  tp,908(ra)

Fix applied to normalizer S_CHECK state where we miss setting inexact when the result is an underflow or overflow.

        when S_CHECK => -- check for overflow/underflow
        -- ------------------------------------------------------------
          if (ctrl.cnt_uf = '1') then -- underflow
            ctrl.flags(fp_exc_uf_c) <= '1';
            -- As is defined in '754, under default exception handling, underflow is 
            -- only signalled when the result is tiny and inexact. In such a case, 
            -- both the underflow and inexact flags are raised.
            ctrl.flags(fp_exc_nx_c) <= '1';
          elsif (ctrl.cnt_of = '1') then -- overflow
            ctrl.flags(fp_exc_of_c) <= '1';
            -- As is defined in '754, under default exception handling, overflow is 
            -- only signalled when the result is large and inexact. In such a case, 
            -- both the underflow and inexact flags are raised.
            ctrl.flags(fp_exc_nx_c) <= '1';
          elsif (ctrl.cnt(7 downto 0) = x"00") then -- subnormal
            ctrl.flags(fp_exc_uf_c) <= '1';
            -- As is defined in '754, under default exception handling, underflow is 
            -- only signalled when the result is tiny and inexact. In such a case, 
            -- both the underflow and inexact flags are raised.
            ctrl.flags(fp_exc_nx_c) <= '1';
          elsif (ctrl.cnt(7 downto 0) = x"FF") then -- infinity
            ctrl.flags(fp_exc_of_c) <= '1';
            -- As is defined in '754, under default exception handling, overflow is 
            -- only signalled when the result is large and inexact. In such a case, 
            -- both the underflow and inexact flags are raised.
            ctrl.flags(fp_exc_nx_c) <= '1';
          end if;
          ctrl.state <= S_FINALIZE;

@mikaelsky
Copy link
Collaborator Author

Bug #5:

Info (IDV) Instruction executed prior to mismatch '0x16ac(inst_175+10): 01df7fd3 fadd.s  x31,x30,x29'
Error (IDV) CSR register value mismatch (HartId:0, PC:0x000016ac inst_175+10):
Error (IDV) Mismatch 0> CSR 001 (fflags)
Error (IDV)   . dut:0x00000004 NV:0 DZ:0 OF:1 UF:0 NX:0
Error (IDV)   . ref:0x00000005 NV:0 DZ:0 OF:1 UF:0 NX:1
Error (IDV) Mismatch 1> CSR 003 (fcsr)
Error (IDV)   . dut:0x00000004 NV:0 DZ:0 OF:1 UF:0 NX:0 FRM:0(RNE)
Error (IDV)   . ref:0x00000005 NV:0 DZ:0 OF:1 UF:0 NX:1 FRM:0(RNE)
Error (IDV) tb.riscv_tb.idv_trace2api.state_compare @ 319700.00 ns: MISMATCH
0000169c <inst_175>:
// fs1 == 0 and fe1 == 0xfe and fm1 == 0x7fffff and fs2 == 0 and fe2 == 0xfe and fm2 == 0x7fffff and  fcsr == 0x0 and rm_val == 7   
/* opcode: fadd.s ; op1:f30; op2:f29; dest:f31; op1val:0x7f7fffff; op2val:0x7f7fffff; 
   valaddr_reg:x3; val_offset:350*FLEN/8; rmval:dyn; fcsr: 0;
   correctval:??; testreg:x2 
*/
SKY_TEST_FPRR_OP(fadd.s, f31, f30, f29, x31, x30, x29, dyn, 0, 0, x3, 350*FLEN/8, x4, x1, x2)
    169c:   5781af03            lw  t5,1400(gp)
    16a0:   57c1ae83            lw  t4,1404(gp)
    16a4:   00000113            li  sp,0
    16a8:   00311073            fscsr   sp
    16ac:   01df7fd3            fadd.s  ft11,ft10,ft9
    16b0:   00302273            frcsr   tp
    16b4:   17f0ac23            sw  t6,376(ra)
    16b8:   1640ae23            sw  tp,380(ra)

Fix the same as Bug #4.

@mikaelsky
Copy link
Collaborator Author

mikaelsky commented Feb 1, 2024

Bug #6:

Info (IDV) Instruction executed prior to mismatch '0x19cc(inst_200+10): 01df7fd3 fadd.s  x31,x30,x29'
Error (IDV) CSR register value mismatch (HartId:0, PC:0x000019cc inst_200+10):
Error (IDV) Mismatch 0> CSR 001 (fflags)
Error (IDV)   . dut:0x00000010 NV:1 DZ:0 OF:0 UF:0 NX:0
Error (IDV)   . ref:0x00000000 NV:0 DZ:0 OF:0 UF:0 NX:0
Error (IDV) Mismatch 1> CSR 003 (fcsr)
Error (IDV)   . dut:0x00000010 NV:1 DZ:0 OF:0 UF:0 NX:0 FRM:0(RNE)
Error (IDV)   . ref:0x00000000 NV:0 DZ:0 OF:0 UF:0 NX:0 FRM:0(RNE) (not updated)
Error (IDV) tb.riscv_tb.idv_trace2api.state_compare @ 363300.00 ns: MISMATCH
000019bc <inst_200>:
// fs1 == 0 and fe1 == 0xff and fm1 == 0x000000 and fs2 == 0 and fe2 == 0xff and fm2 == 0x000000 and  fcsr == 0x0 and rm_val == 7   
/* opcode: fadd.s ; op1:f30; op2:f29; dest:f31; op1val:0x7f800000; op2val:0x7f800000; 
   valaddr_reg:x3; val_offset:400*FLEN/8; rmval:dyn; fcsr: 0;
   correctval:??; testreg:x2 
*/
SKY_TEST_FPRR_OP(fadd.s, f31, f30, f29, x31, x30, x29, dyn, 0, 0, x3, 400*FLEN/8, x4, x1, x2)
    19bc:   6401af03            lw  t5,1600(gp)
    19c0:   6441ae83            lw  t4,1604(gp)
    19c4:   00000113            li  sp,0
    19c8:   00311073            fscsr   sp
    19cc:   01df7fd3            fadd.s  ft11,ft10,ft9
    19d0:   00302273            frcsr   tp
    19d4:   25f0a023            sw  t6,576(ra)
    19d8:   2440a223            sw  tp,580(ra)

The fix is to modify the exception handling in the add/sub block. This is a partial fix, as I note in the comments, there might be further exceptions to the exceptions that need to be taken into account.

      -- exception flags --
      -- Only assert non-valid operation of the two infinities are of opposite sign.
      -- Should take add/sub into account I believe so e.g +inf + inf => +inf but +int - +inf => NV?
      addsub.flags(fp_exc_nv_c) <= ((fpu_operands.rs1_class(fp_class_pos_inf_c) and fpu_operands.rs2_class(fp_class_neg_inf_c)) or
                                    (fpu_operands.rs2_class(fp_class_pos_inf_c) and fpu_operands.rs1_class(fp_class_neg_inf_c))); -- +inf +/- -inf or -inf +/- +inf

Addendum: Code fix updated to correctly deal with infinities based on FPU operation:

      -- exception flags --
      -- Infinities decoder ring
      -- fadd:
      -- Rs1 \ Rs2 | +inf | -inf | <- Rs2
      -- --------------------------------
      --      +inf | +inf |  NV  |
      -- --------------------------------
      --      -inf |  NV  | -inf |
      -- --------------------------------
      --     ^
      --     |
      --    Rs1
      --
      -- fsub:
      -- Rs1 \ Rs2 | +inf | -inf | <- Rs2
      -- --------------------------------
      --      +inf |  NV  | +inf |
      -- --------------------------------
      --      -inf | -inf |  NV  |
      -- --------------------------------
      --     ^
      --     |
      --    Rs1
      -- Assume the operation is valid
      addsub.flags(fp_exc_nv_c) <= '0';
      if (ctrl_i.ir_funct12(7) = '0') then -- add
        -- Do we have 2 infinities of opposite sign?
        if (((fpu_operands.rs1_class(fp_class_pos_inf_c) and fpu_operands.rs2_class(fp_class_neg_inf_c))         or 
             (fpu_operands.rs1_class(fp_class_neg_inf_c) and fpu_operands.rs2_class(fp_class_pos_inf_c))) = '1') then
          addsub.flags(fp_exc_nv_c) <= '1';
        end if;
      else -- sub
        -- Do we have 2 infinities of same sign?
        if (((fpu_operands.rs1_class(fp_class_pos_inf_c) and fpu_operands.rs2_class(fp_class_pos_inf_c))         or 
             (fpu_operands.rs1_class(fp_class_neg_inf_c) and fpu_operands.rs2_class(fp_class_neg_inf_c))) = '1') then
          addsub.flags(fp_exc_nv_c) <= '1';
        end if;
      end if;

@mikaelsky
Copy link
Collaborator Author

Bug #7.

Info (IDV) Instruction executed prior to mismatch '0x3dd4(inst_388+28): 01df7fd3 fadd.s  x31,x30,x29'
Error (IDV) GPR register value mismatch (HartId:0, PC:0x00003dd4 inst_388+28):
Error (IDV) Mismatch 0> GPR x31
Error (IDV)   . dut:0x80000000
Error (IDV)   . ref:0x00000000
Error (IDV) tb.riscv_tb.idv_trace2api.state_compare @ 749380.00 ns: MISMATCH
00003dac <inst_388>:
// fs1 == 1 and fe1 == 0x01 and fm1 == 0x000000 and fs2 == 0 and fe2 == 0x01 and fm2 == 0x000000 and  fcsr == 0x0 and rm_val == 7   
/* opcode: fadd.s ; op1:f30; op2:f29; dest:f31; op1val:0x80800000; op2val:0x800000; 
   valaddr_reg:x3; val_offset:776*FLEN/8; rmval:dyn; fcsr: 0;
   correctval:??; testreg:x2 
*/
SKY_TEST_FPRR_OP(fadd.s, f31, f30, f29, x31, x30, x29, dyn, 0, 0, x3, 776*FLEN/8, x4, x1, x2)
    3dac:   00001137            lui sp,0x1
    3db0:   002181b3            add gp,gp,sp
    3db4:   c201af03            lw  t5,-992(gp)
    3db8:   402181b3            sub gp,gp,sp
    3dbc:   00001137            lui sp,0x1
    3dc0:   002181b3            add gp,gp,sp
    3dc4:   c241ae83            lw  t4,-988(gp)
    3dc8:   402181b3            sub gp,gp,sp
    3dcc:   00000113            li  sp,0
    3dd0:   00311073            fscsr   sp
    3dd4:   01df7fd3            fadd.s  ft11,ft10,ft9
    3dd8:   00302273            frcsr   tp
    3ddc:   03f0a023            sw  t6,32(ra)
    3de0:   0240a223            sw  tp,36(ra)

This one has everything to do with sign handling of the result is exactly 0.0. The notes on 0.0 and signs provided above state that for fadd if the signs of rs1 and rs2 are different and the result is 0.0 it must be a +0.0 and not -0.0. Same goes for fsub, except here its for cases where rs1 and rs2 are the same sign. Off-course there is a rounding mode exception as well.

      -- result sign --
      if (ctrl_i.ir_funct12(7) = '0') then -- add
        if (fpu_operands.rs1(31) = fpu_operands.rs2(31)) then -- identical signs
          addsub.res_sign <= fpu_operands.rs1(31);
        else -- different signs
          -- if the result is not 0.0 set the sign normally
          if ((to_integer(unsigned(addsub.add_stage))) /= 0 ) then
            if (addsub.exp_comp(1) = '1') then -- exp are equal (also check relation of mantissas)
              addsub.res_sign <= fpu_operands.rs1(31) xor (not addsub.man_comp);
            else
              addsub.res_sign <= fpu_operands.rs1(31) xor addsub.exp_comp(0);
            end if;
          else 
            --  roundTowardNegative; under that attribute, the sign of an exact zero sum (or difference) shall be −0
            if (fpu_operands.frm = "010") then -- round down (towards -infinity)
              addsub.res_sign <= '1'; -- set the sign to 0 to generate a +0.0 result
            else
              addsub.res_sign <= '0'; -- set the sign to 0 to generate a +0.0 result
            end if;
          end if;
        end if;
      else -- sub
        -- identical signs
        -- if the result is not 0.0 set the sign normally
        if (fpu_operands.rs1(31) = fpu_operands.rs2(31)) then
          if ((to_integer(unsigned(addsub.add_stage))) /= 0 ) then
            if (addsub.exp_comp(1) = '1') then -- exp are equal (also check relation of mantissas)
              addsub.res_sign <= fpu_operands.rs1(31) xor (not addsub.man_comp);
            else
              addsub.res_sign <= fpu_operands.rs1(31) xor addsub.exp_comp(0);
            end if;
          else 
            --  roundTowardNegative; under that attribute, the sign of an exact zero sum (or difference) shall be −0
            if (fpu_operands.frm = "010") then -- round down (towards -infinity)
              addsub.res_sign <= '1'; -- set the sign to 0 to generate a +0.0 result
            else
              addsub.res_sign <= '0'; -- set the sign to 0 to generate a +0.0 result
            end if;
          end if;
        else -- different signs
          addsub.res_sign <= fpu_operands.rs1(31);
        end if;
      end if;

@mikaelsky
Copy link
Collaborator Author

With the above fixes we pass the riscof-arch fadd_b1 sub-test for all cases. With some sub-normal caveats. We do pass against our Imperas DV model that has modified sub-normal handling to be equivalent to the neorv32 core, even if the inputs are sub-normal.

@mikaelsky
Copy link
Collaborator Author

With the above fixed we pass 11 out of 59 F/Zfinx riscof-arch tests. More are "passing" but the F/Zfinx riscof-arch test include sub-normals and the Sail model doesn't replace sub-normals with 0.0 as we do.

@mikaelsky
Copy link
Collaborator Author

Created PR 788 to apply the above fixes.

@stnolting
Copy link
Owner

Wow, this is really amazing! Thank you so much for all your feedback! I think I'll need some time to look through everything 😅

@stnolting
Copy link
Owner

More are "passing" but the F/Zfinx riscof-arch test include sub-normals and the Sail model doesn't replace sub-normals with 0.0 as we do.

I found this article discussing GCC's -ffast-math switch. Maybe this can be used to avoid subnormal numbers within an application. However, when the floating point numbers are hardcoded then this flag might have no effect.

@mikaelsky
Copy link
Collaborator Author

Interesting, maybe. We've been experimenting with -ffast-math to help optimize down the size of the code base. Did hit a few snags with exception handling, as in the softfloat library went of the rails when a div(0) error occurred.
Sub-normals in general are an interesting discussion :) Some DSP folks LOVE them and other HATE them.

@stnolting
Copy link
Owner

stnolting commented Feb 4, 2024

Some DSP folks LOVE them and other HATE them.

I'm on the latter team, but that's just me. 😅

@mikaelsky
Copy link
Collaborator Author

The issues identified in this issue have been patched.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants