Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarification on Zfinx behavior for subnormal operands and results needed #728

Closed
jstraus59 opened this issue Nov 16, 2023 · 17 comments
Closed
Assignees
Labels
bug Something isn't working HW hardware-related risc-v compliance Modification to comply with official RISC-V specs.

Comments

@jstraus59
Copy link

jstraus59 commented Nov 16, 2023

The NEORV32 documentation Zfinx extension documentation states:

Subnormal numbers (exponent = 0) are flushed to zero setting them to +/- 0 before entering the FPU’s processing core. If a computational instruction (like fmul.s) generates a subnormal result, the result is also flushed to zero during normalization.

Would it be possible to be add more detail on subnormal flushing, specifically:

  • How are the fflag bits set when subnormal inputs are seen, or when subnormal results are flushed to zero. A precise specification for each instruction class (Computational, Conversion, Compare and Clasify) would be helpful.
  • For input flushing, it simply states flushed...before entering the FPU’s processing core, but exactly which instruction classes are implemented in the FPU core, and thus input flushing applies to them, is not clear. Obviously, this applies to the Computational instruction class, but an explicit specification of whether this also applies to the Conversion, Compare and Clasify instruction classes would be helpful.

Thanks!

Jim Straus
Imperas

@mikaelsky
Copy link
Collaborator

@jstraus59 from my reading of the code the first step in the FPU is to determine the "class" of a given float to extract +/- inf, NAN etc. before passing them further into the FPU. During this intake process if an number with exp 0 is seen the mantissa just gets set to 0, resulting in any subnormal being converted to a 0. E.g a subnormal A + subnormal B gets converted to 0.0 + 0.0.
As this a pre-processing step it happens disregarding instruction class. I'll do a bit more digging but thats the gist of it.

Currently the fflag bits are .. not functional in general which include subnormals. This is on my agenda to fix, so assume that fflags will be functional going forward.

@jstraus59
Copy link
Author

@mikaelsky Terrific - thanks for the info. I will assume that inputs to the Conversion and Computation instructions have their subnormal inputs flushed to 0.

I still need a couple details clarified, please (sorry but I have never done RTL design - just modeling of processor architectures - so it would be difficult for me to figure this out directly from the RTL myself):

  1. I am wondering still about the FCLASS instruction, which simply reports the type of floating-point value that is in a register. One of the things it detects is whether the value is a subnormal, so it would not really make sense for subnormal inputs to be flushed for the FCLASS instruction, but if FCLASS is implemented within the FPU it sounds like they would. Could you please confirm?

  2. I am assuming that subnormal inputs and results are actually flushed to +/-0, not just +0. Could you please confirm that, also?

@mikaelsky
Copy link
Collaborator

@jstraus59 good question. Currently deep into fixing an issue with misaligned branches being out of spec.
From a quick scan FCLASS currently never reports a denorm/subnormal class:
op_is_denorm_v := '0'; -- FIXME / TODO -- op_e_all_zero_v and (not op_m_all_zero_v); -- subnormal

Correct the on the input path Rs1 and Rs2 gets converted to the equivalent of a +/- 0.0. Basically of EXP == 0 set MANT = 0;
op_data(0)(22 downto 00) <= (others => '0') when (rs1_i(30 downto 23) = "00000000") else rs1_i(22 downto 0); -- flush mantissa to zero if subnormal

For the output side a subnormal result gets converted to +/- 0.0 as the result sign gets preserved and the EXP and MANT both get set to 0.
elsif (ctrl.flags(fp_exc_uf_c) = '1') or -- underflow
(sreg.zero = '1') or (ctrl.class(fp_class_neg_denorm_c) = '1') or (ctrl.class(fp_class_pos_denorm_c) = '1') then -- denormalized (flush-to-zero)
ctrl.res_exp <= fp_single_pos_zero_c(30 downto 23); -- keep original sign
ctrl.res_man <= fp_single_pos_zero_c(22 downto 00);

@jstraus59
Copy link
Author

Ok - that answers my questions about everything but the fflags behavior.

I will leave the issue open as a reminder to document the precise fflags behavior w.r.t. subnormals once that implementation is finished.

Thanks @mikaelsky!

@stnolting
Copy link
Owner

Hey everyone! Sorry for the late reply... So thanks @mikaelsky for jumping in! 😉

Right now all input operands are being flushed-to-zero if they are subnormal before entering the actual FPU core. Hence, @mikaelsky is right saying that this flush-to-zero is applied to all instruction classes.

However, I think you are right that this does not make much sense for the fclass instruction. I'll fix that.

I just saw there is a bug in the wiring of the exception flags output. I'll also fix that and I will also add some test cases to the floating-point tests to cover some simple conditions that should raise FPU exceptions.

@stnolting stnolting self-assigned this Nov 20, 2023
@stnolting stnolting added bug Something isn't working risc-v compliance Modification to comply with official RISC-V specs. HW hardware-related labels Nov 20, 2023
@mikaelsky
Copy link
Collaborator

@stnolting working on a misalign branch issue. Basically the core is subtly out of spec. Thought I had an "easy" fix, but JALR is not behaving. Basically any misalignment is supposed to be captured by the offending branch and trapped without updating the link register nor fetching from the misaligned PC.

I also have a set of MTVAL spec updates from various traps to help with spec compliance. Unsure when it makes sense to drop them in, kinda waiting for the 1.9.1 release before trying to apply updates.

On top I've added an RVVI trace port to help validate against an ISA like Imperas. This is where we've found the above mentioned spec issues.

@stnolting
Copy link
Owner

Oh, sorry for overlooking that earlier comment... 🙈

I think you are right. If the CPU encounters a misaligned instruction fetch the link register will still be written (for link instructions) and the pipeline's front end will still try to read instruction data from that unaligned address... I will have a closer look at this. Thanks for finding this bug!

@stnolting
Copy link
Owner

stnolting commented Nov 21, 2023

I have opened #734 to fix this misalignment-issue.

@mikaelsky

On top I've added an RVVI trace port to help validate against an ISA like Imperas. This is where we've found the above mentioned spec issues.

I'm curious... Can you say something more about that verification setup?
Please let us know if there are any more spec-compatibility issues 🙈😅

@mikaelsky
Copy link
Collaborator

@stnolting we are "working over the core" so to speak using an system from Imperas, https://www.imperas.com/imperas-riscv-solutions.

Basically we are doing what I've always done when validating processor cores. We compare the internal states of the processor after every instruction versus a golden reference model from Imperas.
This means when any instruction is "retired" we check:

  • Are the correct processor registers updated, and are we not missing anything.
  • Is the PC correct
  • Is the opcode correct
  • Is the next PC correct
  • Have any CSRs been updated and if so are they the right ones.
  • Have any memories been read/written from and is the address and data correct.

We do this for every piece of code we run, like the entire riscv-arch test suite.

To do these compares we yank out a lot of information from the core internals and provide it to the compare system. This is what the RVVI trace port is for.
In my local branch I have a parameter that allows us to enable/disable the functionality, as it costs some logic for signal delays n such.

I have a few other custom parameters we use like: regfile_reset_en to turn on reset for the regfile and boot_address_en to allow us to feed an external boot address to the core to support stuff like application restore functions.

@LarryImperas
Copy link

@mikaelsky Thanks for the mention and the link to Imperas. Actually a more direct link to the ImperasDV processor verification tools is here: https://www.imperas.com/imperasdv.

@stnolting
Copy link
Owner

@mikaelsky

How do you "export" all this data from the core? Do you use something like the RISC-V Formal Interface (RVFI)? Or did you actually implement a full-scale trace port? I know there is a RISC-V proposal/extension for that, but I am not sure in what state this is...

regfile_reset_en

We have something similar now: #736 😉

and boot_address_en to allow us to feed an external boot address to the core to support stuff like application restore functions.

We are discussing something like this, too (-> #720).

@mikaelsky
Copy link
Collaborator

We implemented the RVFI trace port. I did explore a full blown trace port but ended up punting on that for this particular iteration.
When my github is up to tip of trunk I can add it in as a separate pull/push request. I do have a generic to disable the whole thing as it adds logic to retime the signals.

I noticed #736. Appreciated :)

For the boot address I have a generic/parameter to enable/disable the feature. Default boot is still 0xFFFF_0000. There are some additional feature changes here and there to enhance some features like support clock gating for sleep mode.

I also have a few more cases for mtval in the trap controller to be more aligned to the spec. Granted the spec is "annoying" in that it says "if mtval is not 0 then it has by XYZ".

          -- NORMAL trap entry: write mcause, mepc and mtval - no update when in debug-mode! --
          -- --------------------------------------------------------------------
          if (CPU_EXTENSION_RISCV_Sdext = false) or ((trap_ctrl.cause(5) = '0') and (debug_ctrl.running = '0')) then
            -- trap cause --
            csr.mcause <= trap_ctrl.cause(trap_ctrl.cause'left) & trap_ctrl.cause(4 downto 0); -- type & identifier
            -- trap PC --
            csr.mepc <= trap_ctrl.epc(XLEN-1 downto 1) & '0';
            -- trap value --
            case trap_ctrl.cause is
              when trap_brk_c | trap_iaf_c =>
                csr.mtval <= trap_ctrl.epc(XLEN-1 downto 1) & '0';
              when trap_lma_c | trap_laf_c | trap_sma_c | trap_saf_c => -- misaligned load/store address or load/store access error
                csr.mtval <= mar_i; -- faulting data access address
              when trap_iil_c => -- illegal instruction
                csr.mtval <= execute_engine.ir; -- faulting instruction word
              when trap_ima_c => -- instruction misaligned
                -- Set mtval to be equal to the misaligned address
                csr.mtval <= execute_engine.pc_illegal_branch;
              when others => -- everything else including all interrupts
                csr.mtval <= (others => '0');
            end case;
            -- update privilege level and interrupt enable stack --
            csr.privilege    <= priv_mode_m_c; -- execute trap in machine mode
            csr.mstatus_mie  <= '0'; -- disable interrupts
            csr.mstatus_mpie <= csr.mstatus_mie; -- backup previous mie state
            csr.mstatus_mpp  <= csr.privilege; -- backup previous privilege mode
          end if;

There is also a discussion to be had around mstatus_mpp reset value. Right now its reset to 0 which is USER mode independent of whether USER mode is enabled or not. In principle it should reset to priv_mode_m_c as we come out of reset in machine mode.

For now I have this little add-on, but would probable just change it to default to '1' or machine mode out of reset.

      if (CPU_EXTENSION_RISCV_U = false) then
        csr.mstatus_mpp   <= '1';
      else
        csr.mstatus_mpp   <= '0';
      end if;

@stnolting
Copy link
Owner

stnolting commented Dec 5, 2023

We implemented the RVFI trace port. I did explore a full blown trace port but ended up punting on that for this particular iteration.
When my github is up to tip of trunk I can add it in as a separate pull/push request. I do have a generic to disable the whole thing as it adds logic to retime the signals.

Sounds good! I'm curious what that trace port will look like.

I also have a few more cases for mtval in the trap controller to be more aligned to the spec. Granted the spec is "annoying" in that it says "if mtval is not 0 then it has by XYZ".

Yeah... the RISC-V priv. spec. is quite annoying there... 😅
I tried to support just a minimum set of features to keep the hardware requirements low (while still passing the RISC-V architecture tests).

There is also a discussion to be had around mstatus_mpp reset value. Right now its reset to 0 which is USER mode independent of whether USER mode is enabled or not. In principle it should reset to priv_mode_m_c as we come out of reset in machine mode.

Good point! I will fix that.

edit -> #745 😉

@jstraus59
Copy link
Author

jstraus59 commented Jan 23, 2024

Reviewing the comments about the original subject (subnormal behavior), there are still a couple issues that I don't think have been explicitly answered:

  1. @mikaelsky wrote "Currently the fflag bits are .. not functional in general which include subnormals. This is on my agenda to fix, so assume that fflags will be functional going forward."
    Still need to know the precise behavior of fflags w.r.t. denormal flushing. I don't think this is defined as part of any spec (AIUI the IEEE spec does not address subnormal flushing at all.)
  2. A precise definition of which instructions "enter the actual FPU core" is needed. Currently I am assuming the following instructions are executed in the FPU core, and thus perform denormal flushing of inputs and results:
    fadd.s fsub.s fmul.s fmin.s fmax.s feq.s flt.s fle.s fcvt.w.s fcvt.wu.s fcvt.s.w fcvt.s.wu
    And these, that simply manipulate bits in a floating point value, do not:
    fsgnj.s fsgnjn.s fsgnjx.s fclass.s
    Please advise if this is accurate.
  3. A comment indicated that the fclass instruction has a FIXME note for the denormal flag:
    op_is_denorm_v := '0'; -- FIXME / TODO -- op_e_all_zero_v and (not op_m_all_zero_v); -- subnormal
    Can I assume that is going to be fixed?

Thanks

@mikaelsky
Copy link
Collaborator

  1. The fflag bits are currently not correctly implemented and need debugging. This is an ongoing effort. Assume the fflags are correct and functional, as that is the intended state.
  2. Not sure why its important whether fsgnj.s, fsgnjn.s, fsgnjx.s and fclass.s enter the FPU core or not? Practically for the neorv32 implementation ALL Zfinx instructions pass into the FPU module. This allows the RTL to be more easily removed when Zfinx isn't used. This is an important implementation feature for the neorv32 core. For the modelling effort it can assumed that these instructions work as defined in the Zfinx extension. If it easier for the modelling effort to "assume they do not enter the FPU" assume away. As long as they will flag an illegal instruction when the Zfinx extension is disabled.
  3. This will be fixed yes.

@mikaelsky
Copy link
Collaborator

The questions seems to have been answered and the model updated to mach the core behavior. I will close this issue as its been resolved on our end.
Thank you.

@stnolting
Copy link
Owner

@mikaelsky thank you very much for all your work on this! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working HW hardware-related risc-v compliance Modification to comply with official RISC-V specs.
Projects
None yet
Development

No branches or pull requests

4 participants