Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CPU] area and timing optimization; closing further illegal instruction holes #293

Merged
merged 15 commits into from
Apr 5, 2022
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ The version number is globally defined by the `hw_version_c` constant in the mai

| Date (*dd.mm.yyyy*) | Version | Comment |
|:----------:|:-------:|:--------|
| 04.04.2022 | 1.6.9.7 | **major CPU logic optimization**: reduced area costs and shortened critical path (higher f_max!); :bug: fixed rare bug in RTE (if C-extension is not implemented); :lock: closed further illegal instruction encoding holes; [PR #293](https://github.com/stnolting/neorv32/pull/293) |
| 01.04.2022 | 1.6.9.6 | rework **CPU front-end**: instruction issue engine; much cleaner code, slightly less HW required; [PR #292](https://github.com/stnolting/neorv32/pull/292) |
| 29.03.2022 | 1.6.9.5 | minor clock generator edits: reset **clock generator** explicitly if not being used by _any_ peripheral/IO device |
| 19.03.2022 | 1.6.9.4 | :test_tube: change usage of VHDL `*_reduce_f` functions for signals that might effect gate-level simulations; [PR #290](https://github.com/stnolting/neorv32/pull/290) |
Expand Down
31 changes: 16 additions & 15 deletions docs/datasheet/cpu.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -837,21 +837,22 @@ configurations are presented in <<_cpu_performance>>.
| ALU | `C` | `c.addi4spn` `c.nop` `c.addi` `c.li` `c.addi16sp` `c.lui` `c.andi` `c.sub` `c.xor` `c.or` `c.and` `c.add` `c.mv` | 2
| ALU | `I/E` | `slli` `srli` `srai` `sll` `srl` `sra` | 3 + SAfootnote:[Shift amount.]/4 + SA%4; FAST_SHIFTfootnote:[Barrel shift when `FAST_SHIFT_EN` is enabled.]: 4; TINY_SHIFTfootnote:[Serial shift when `TINY_SHIFT_EN` is enabled.]: 2..32
| ALU | `C` | `c.srli` `c.srai` `c.slli` | 3 + SAfootnote:[Shift amount (0..31).]; FAST_SHIFTfootnote:[Barrel shifter when `FAST_SHIFT_EN` is enabled.]:
| Branches | `I/E` | `beq` `bne` `blt` `bge` `bltu` `bgeu` | Taken: 5 + MLfootnote:[Memory latency.]; Not taken: 3
| Branches | `C` | `c.beqz` `c.bnez` | Taken: 5 + MLfootnote:[Memory latency.]; Not taken: 3
| Jumps / Calls | `I/E` | `jal` `jalr` | 4 + ML
| Jumps / Calls | `C` | `c.jal` `c.j` `c.jr` `c.jalr` | 4 + ML
| Memory access | `I/E` | `lb` `lh` `lw` `lbu` `lhu` `sb` `sh` `sw` | 4 + ML
| Memory access | `C` | `c.lw` `c.sw` `c.lwsp` `c.swsp` | 4 + ML
| Memory access | `A` | `lr.w` `sc.w` | 4 + ML
| Multiplication | `M` | `mul` `mulh` `mulhsu` `mulhu` | 2+32+2; FAST_MULfootnote:[DSP-based multiplication; enabled via `FAST_MUL_EN`.]: 4
| Division | `M` | `div` `divu` `rem` `remu` | 2+32+2
| CSR access | `Zicsr` | `csrrw` `csrrs` `csrrc` `csrrwi` `csrrsi` `csrrci` | 3
| System | `I/E` | `fence` | 3
| Branches | `I/E` | `beq` `bne` `blt` `bge` `bltu` `bgeu` | Taken: 5 + (ML-1)footnote:[Memory latency.]; Not taken: 3
| Branches | `C` | `c.beqz` `c.bnez` | Taken: 5 + (ML-1); Not taken: 3
| Jumps / Calls | `I/E` | `jal` `jalr` | 5 + (ML-1)
| Jumps / Calls | `C` | `c.jal` `c.j` `c.jr` `c.jalr` | 5 + (ML-1)
| Memory access | `I/E` | `lb` `lh` `lw` `lbu` `lhu` `sb` `sh` `sw` | 5 + (ML-2)
| Memory access | `C` | `c.lw` `c.sw` `c.lwsp` `c.swsp` | 5 + (ML-2)
| Memory access | `A` | `lr.w` `sc.w` | 5 + (ML-2)
| MulDiv | `M` | `mul` `mulh` `mulhsu` `mulhu` | 2+32+2; FAST_MULfootnote:[DSP-based multiplication; enabled via `FAST_MUL_EN`.]: 4
| MulDiv | `M` | `div` `divu` `rem` `remu` | 2+32+2
| System | `Zicsr` | `csrrw` `csrrs` `csrrc` `csrrwi` `csrrsi` `csrrci` | 3
| System | `Zicsr` | `ecall` `ebreak` | 3
| System | `Zicsr`+`C` | `c.break` | 3
| System | `Zicsr` | `mret` `wfi` | 6
| System | `Zifencei` | `fence.i` | 3 + ML
| System | `Zicsr`+`C` | `c.break` | 3
| System | `Zicsr` | `wfi` | 3
| System | `Zicsr` | `mret` `dret` | 5
| Fence | `I/E` | `fence` | 4 + ML
| Fence | `Zifencei` | `fence.i` | 4 + ML
| Floating-point - artihmetic | `Zfinx` | `fadd.s` | 110
| Floating-point - artihmetic | `Zfinx` | `fsub.s` | 112
| Floating-point - artihmetic | `Zfinx` | `fmul.s` | 22
Expand All @@ -869,7 +870,7 @@ configurations are presented in <<_cpu_performance>>.
| Bit-manipulation - carry-less multiply | `B(Zbc)` | `clmul` `clmulh` `clmulr` | 3 + 32
| Custom instructions (CFU) | `Zxcfu` | - | min. 4
| | | |
| _Illegal instructions_ | `Zicsr` | - | 2
| _Illegal instructions_ | `Zicsr` | - | min. 2
|=======================

[NOTE]
Expand Down
156 changes: 74 additions & 82 deletions rtl/core/neorv32_cpu_bus.vhd
Original file line number Diff line number Diff line change
Expand Up @@ -115,14 +115,13 @@ architecture neorv32_cpu_bus_rtl of neorv32_cpu_bus is
constant pmp_cfg_x_c : natural := 2; -- execute permit
constant pmp_cfg_al_c : natural := 3; -- mode bit low
constant pmp_cfg_ah_c : natural := 4; -- mode bit high
--
constant pmp_cfg_l_c : natural := 7; -- locked entry

-- PMP minimal granularity --
constant pmp_lsb_c : natural := index_size_f(PMP_MIN_GRANULARITY);

-- data interface registers --
signal mar, mdo, mdi : std_ulogic_vector(data_width_c-1 downto 0);
-- data memory address register --
signal mar : std_ulogic_vector(data_width_c-1 downto 0);

-- data access --
signal d_bus_wdata : std_ulogic_vector(data_width_c-1 downto 0); -- write data
Expand Down Expand Up @@ -197,86 +196,84 @@ begin
mem_do_reg: process(rstn_i, clk_i)
begin
if (rstn_i = '0') then
mdo <= (others => def_rst_val_c);
d_bus_wdata <= (others => def_rst_val_c);
d_bus_ben <= (others => def_rst_val_c);
elsif rising_edge(clk_i) then
if (ctrl_i(ctrl_bus_mo_we_c) = '1') then
mdo <= wdata_i; -- memory data output register (MDO)
-- byte enable and data alignment --
case ctrl_i(ctrl_bus_size_msb_c downto ctrl_bus_size_lsb_c) is -- data size
when "00" => -- byte
d_bus_wdata(07 downto 00) <= wdata_i(7 downto 0);
d_bus_wdata(15 downto 08) <= wdata_i(7 downto 0);
d_bus_wdata(23 downto 16) <= wdata_i(7 downto 0);
d_bus_wdata(31 downto 24) <= wdata_i(7 downto 0);
case addr_i(1 downto 0) is
when "00" => d_bus_ben <= "0001";
when "01" => d_bus_ben <= "0010";
when "10" => d_bus_ben <= "0100";
when others => d_bus_ben <= "1000";
end case;
when "01" => -- half-word
d_bus_wdata(31 downto 16) <= wdata_i(15 downto 0);
d_bus_wdata(15 downto 00) <= wdata_i(15 downto 0);
if (addr_i(1) = '0') then
d_bus_ben <= "0011"; -- low half-word
else
d_bus_ben <= "1100"; -- high half-word
end if;
when others => -- word
d_bus_wdata <= wdata_i;
d_bus_ben <= "1111"; -- full word
end case;
end if;
end if;
end process mem_do_reg;

-- byte enable and output data alignment --
write_align: process(mar, mdo, ctrl_i)
begin
case ctrl_i(ctrl_bus_size_msb_c downto ctrl_bus_size_lsb_c) is -- data size
when "00" => -- byte
d_bus_wdata(07 downto 00) <= mdo(7 downto 0);
d_bus_wdata(15 downto 08) <= mdo(7 downto 0);
d_bus_wdata(23 downto 16) <= mdo(7 downto 0);
d_bus_wdata(31 downto 24) <= mdo(7 downto 0);
case mar(1 downto 0) is
when "00" => d_bus_ben <= "0001";
when "01" => d_bus_ben <= "0010";
when "10" => d_bus_ben <= "0100";
when others => d_bus_ben <= "1000";
end case;
when "01" => -- half-word
d_bus_wdata(31 downto 16) <= mdo(15 downto 0);
d_bus_wdata(15 downto 00) <= mdo(15 downto 0);
if (mar(1) = '0') then
d_bus_ben <= "0011"; -- low half-word
else
d_bus_ben <= "1100"; -- high half-word
end if;
when others => -- word
d_bus_wdata <= mdo;
d_bus_ben <= "1111"; -- full word
end case;
end process write_align;


-- Data Interface: Read Data --------------------------------------------------------------
-- -------------------------------------------------------------------------------------------
mem_di_reg: process(rstn_i, clk_i)
read_align: process(rstn_i, clk_i)
variable shifted_data_v : std_ulogic_vector(31 downto 0);
begin
if (rstn_i = '0') then
mdi <= (others => def_rst_val_c);
rdata_align <= (others => def_rst_val_c);
elsif rising_edge(clk_i) then
if (ctrl_i(ctrl_bus_mi_we_c) = '1') then
mdi <= d_bus_rdata; -- memory data input register (MDI)
end if;
-- input data alignment and sign extension --
case ctrl_i(ctrl_bus_size_msb_c downto ctrl_bus_size_lsb_c) is
when "00" => -- byte
case mar(1 downto 0) is
when "00" => -- byte 0
rdata_align(07 downto 00) <= d_bus_rdata(07 downto 00);
rdata_align(31 downto 08) <= (others => ((not ctrl_i(ctrl_bus_unsigned_c)) and d_bus_rdata(07))); -- sign extension
when "01" => -- byte 1
rdata_align(07 downto 00) <= d_bus_rdata(15 downto 08);
rdata_align(31 downto 08) <= (others => ((not ctrl_i(ctrl_bus_unsigned_c)) and d_bus_rdata(15))); -- sign extension
when "10" => -- byte 2
rdata_align(07 downto 00) <= d_bus_rdata(23 downto 16);
rdata_align(31 downto 08) <= (others => ((not ctrl_i(ctrl_bus_unsigned_c)) and d_bus_rdata(23))); -- sign extension
when others => -- byte 3
rdata_align(07 downto 00) <= d_bus_rdata(31 downto 24);
rdata_align(31 downto 08) <= (others => ((not ctrl_i(ctrl_bus_unsigned_c)) and d_bus_rdata(31))); -- sign extension
end case;
when "01" => -- half-word
if (mar(1) = '0') then
rdata_align(15 downto 00) <= d_bus_rdata(15 downto 00); -- low half-word
rdata_align(31 downto 16) <= (others => ((not ctrl_i(ctrl_bus_unsigned_c)) and d_bus_rdata(15))); -- sign extension
else
rdata_align(15 downto 00) <= d_bus_rdata(31 downto 16); -- high half-word
rdata_align(31 downto 16) <= (others => ((not ctrl_i(ctrl_bus_unsigned_c)) and d_bus_rdata(31))); -- sign extension
end if;
when others => -- word
rdata_align <= d_bus_rdata; -- full word
end case;
end if;
end process mem_di_reg;

-- input data alignment and sign extension --
read_align: process(mdi, mar, ctrl_i)
variable shifted_data_v : std_ulogic_vector(31 downto 0);
begin
-- align input word --
case mar(1 downto 0) is
when "00" => shifted_data_v := mdi(31 downto 00);
when "01" => shifted_data_v := x"00" & mdi(31 downto 08);
when "10" => shifted_data_v := x"0000" & mdi(31 downto 16);
when others => shifted_data_v := x"000000" & mdi(31 downto 24);
end case;
-- actual data size and sign-extension --
case ctrl_i(ctrl_bus_size_msb_c downto ctrl_bus_size_lsb_c) is
when "00" => -- byte
rdata_align(31 downto 08) <= (others => ((not ctrl_i(ctrl_bus_unsigned_c)) and shifted_data_v(7))); -- sign extension
rdata_align(07 downto 00) <= shifted_data_v(07 downto 00);
when "01" => -- half-word
rdata_align(31 downto 16) <= (others => ((not ctrl_i(ctrl_bus_unsigned_c)) and shifted_data_v(15))); -- sign extension
rdata_align(15 downto 00) <= shifted_data_v(15 downto 00); -- high half-word
when others => -- word
rdata_align <= shifted_data_v; -- full word
end case;
end process read_align;

-- insert exclusive lock status for SC operations only --
rdata_o <= exclusive_lock_status when (CPU_EXTENSION_RISCV_A = true) and (ctrl_i(ctrl_bus_ch_lock_c) = '1') else rdata_align;


-- Data Access Arbiter (controlled by pipeline BACK-end) ----------------------------------
-- Data Interface: Arbiter (controlled by pipeline back-end) ------------------------------
-- -------------------------------------------------------------------------------------------
data_access_arbiter: process(rstn_i, clk_i)
begin
Expand Down Expand Up @@ -325,24 +322,19 @@ begin
d_bus_rdata <= d_bus_rdata_i;

-- check data access address alignment --
misaligned_d_check: process(mar, ctrl_i)
misaligned_d_check: process(rstn_i, clk_i)
begin
case ctrl_i(ctrl_bus_size_msb_c downto ctrl_bus_size_lsb_c) is -- data size
when "00" => -- byte
d_misaligned <= '0';
when "01" => -- half-word
if (mar(0) /= '0') then
d_misaligned <= '1';
else
d_misaligned <= '0';
end if;
when others => -- word
if (mar(1 downto 0) /= "00") then
d_misaligned <= '1';
else
d_misaligned <= '0';
end if;
end case;
if (rstn_i = '0') then
d_misaligned <= def_rst_val_c;
elsif rising_edge(clk_i) then
if (ctrl_i(ctrl_bus_mo_we_c) = '1') then
case ctrl_i(ctrl_bus_size_msb_c downto ctrl_bus_size_lsb_c) is -- data size
when "00" => d_misaligned <= '0'; -- byte
when "01" => d_misaligned <= addr_i(0); -- half-word
when others => d_misaligned <= addr_i(1) or addr_i(0); -- word
end case;
end if;
end if;
end process misaligned_d_check;

-- additional register stage for control signals if using PMP_NUM_REGIONS > pmp_num_regions_critical_c --
Expand Down Expand Up @@ -390,7 +382,7 @@ begin
d_bus_lock_o <= exclusive_lock;


-- Instruction Fetch Arbiter (controlled by pipeline FRONT-end) ---------------------------
-- Instruction Interface: Arbiter (controlled by pipeline front-end) ----------------------
-- -------------------------------------------------------------------------------------------
ifetch_arbiter: process(rstn_i, clk_i)
begin
Expand Down
Loading