Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CPU logic optimization #204

Merged
merged 7 commits into from
Nov 14, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ defined by the `hw_version_c` constant in the main VHDL package file [`rtl/core/

| Date (*dd.mm.yyyy*) | Version | Comment |
|:----------:|:-------:|:--------|
| 14.11.2021 | 1.6.3.7 | major control unit and ALU logic optimizations, reduced hardware footprint; closed further illegal instruction encoding holes (system environment instructions, ALU and ALU-immediate instructions, FENCE instructions); [PR #204](https://github.com/stnolting/neorv32/pull/204) |
| 10.11.2021 | 1.6.3.6 | optimized BUSKEEPER: removed redundant logic - bus keeper now also shows an external interface access timeout (if implemented) as "timeout error"; removed _BUSKEEPER_ERR_SRC_ status flag; :warning: added `err_o` (fault access operation) to the custom functions subsystem (CFS) |
| 09.11.2021 | 1.6.3.5 | :warning: reworked IRQ trigger logic of SPI, TWI, UART0, UART1, NELOED and SLINK; FIRQs now only trigger **once** when the programmed interrupt condition is met instead of triggering **all the time** (see [PR #202](https://github.com/stnolting/neorv32/pull/202)) |
| 06.11.2021 | 1.6.3.4 | :bug: fixed bug in **WISHBONE** interface: _pipelined_ Wishbone mode did not clear STB after first transfer cycle |
Expand Down
21 changes: 12 additions & 9 deletions docs/datasheet/cpu.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -611,25 +611,28 @@ code (see `sw/example/floating_point_test`).

The CSR access instructions as well as the exception and interrupt system (= the privileged architecture)
is implemented when the `CPU_EXTENSION_RISCV_Zicsr` configuration generic is _true_.

[IMPORTANT]
If the `Zicsr` extension is disabled the CPU does not provide any _privileged architecture_ features at all!
In order to provide the full set of privileged functions that are required to run more complex tasks like
operating system and to allow a secure execution environment the `Zicsr` extension should always be enabled.

In this case the following instructions are available:

* CSR access: `csrrw`, `csrrs`, `csrrc`, `csrrwi`, `csrrsi`, `csrrci`
* environment: `mret`, `wfi`

[WARNING]
If the `Zicsr` extension is disabled the CPU does not provide any _privileged architecture_ features at all!
In order to provide the full set of functions and to allow a secure execution
environment the `Zicsr` extension should always be enabled.
[NOTE]
If `rd=x0` for the `csrrw[i]` instructions there will be no actual read access to the according CSR.
However, access privileges are still enforced so these instruction variants _do_ cause side-effects
(the RISC-V spec. state that these combinations "_shall_ not cause any side-effects").

[NOTE]
The "wait for interrupt instruction" `wfi` works like a sleep command. When executed, the CPU is
The "wait for interrupt instruction" `wfi` acts like a sleep command. When executed, the CPU is
halted until a valid interrupt request occurs. To wake up again, the according interrupt source has to
be enabled via the `mie` CSR and the global interrupt enable flag in `mstatus` has to be set.

[NOTE]
The `wfi` instruction may also be executed in user-mode without causing an exception as <<_mstatus>> bit
`TW` (timeout wait) is hardwired to zero.

`TW` (timeout wait) is _hardwired_ to zero.



Expand Down
5 changes: 4 additions & 1 deletion rtl/core/neorv32_cpu.vhd
Original file line number Diff line number Diff line change
Expand Up @@ -156,6 +156,7 @@ architecture neorv32_cpu_rtl of neorv32_cpu is
signal be_store : std_ulogic; -- bus error on store data access
signal fetch_pc : std_ulogic_vector(data_width_c-1 downto 0); -- pc for instruction fetch
signal curr_pc : std_ulogic_vector(data_width_c-1 downto 0); -- current pc (for current executed instruction)
signal next_pc : std_ulogic_vector(data_width_c-1 downto 0); -- next pc (for next executed instruction)
signal fpu_flags : std_ulogic_vector(4 downto 0); -- FPU exception flags

-- pmp interface --
Expand Down Expand Up @@ -285,6 +286,7 @@ begin
imm_o => imm, -- immediate
fetch_pc_o => fetch_pc, -- PC for instruction fetch
curr_pc_o => curr_pc, -- current PC (corresponding to current instruction)
next_pc_o => next_pc, -- next PC (corresponding to next instruction)
csr_rdata_o => csr_rdata, -- CSR read data
-- FPU interface --
fpu_flags_i => fpu_flags, -- exception flags
Expand Down Expand Up @@ -355,7 +357,8 @@ begin
-- data input --
rs1_i => rs1, -- rf source 1
rs2_i => rs2, -- rf source 2
pc2_i => curr_pc, -- delayed PC
pc_i => curr_pc, -- current PC
pc2_i => next_pc, -- next PC
imm_i => imm, -- immediate
csr_i => csr_rdata, -- CSR read data
-- data output --
Expand Down
91 changes: 41 additions & 50 deletions rtl/core/neorv32_cpu_alu.vhd
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,8 @@ entity neorv32_cpu_alu is
-- data input --
rs1_i : in std_ulogic_vector(data_width_c-1 downto 0); -- rf source 1
rs2_i : in std_ulogic_vector(data_width_c-1 downto 0); -- rf source 2
pc2_i : in std_ulogic_vector(data_width_c-1 downto 0); -- delayed PC
pc_i : in std_ulogic_vector(data_width_c-1 downto 0); -- current PC
pc2_i : in std_ulogic_vector(data_width_c-1 downto 0); -- next PC
imm_i : in std_ulogic_vector(data_width_c-1 downto 0); -- immediate
csr_i : in std_ulogic_vector(data_width_c-1 downto 0); -- CSR read data
-- data output --
Expand All @@ -85,10 +86,8 @@ architecture neorv32_cpu_cpu_rtl of neorv32_cpu_alu is

-- results --
signal addsub_res : std_ulogic_vector(data_width_c downto 0);
--
signal alu_res : std_ulogic_vector(data_width_c-1 downto 0);
signal cp_res : std_ulogic_vector(data_width_c-1 downto 0);
signal arith_res : std_ulogic_vector(data_width_c-1 downto 0);
signal logic_res : std_ulogic_vector(data_width_c-1 downto 0);

-- co-processor arbiter and interface --
type cp_ctrl_t is record
Expand Down Expand Up @@ -119,7 +118,7 @@ begin

-- ALU Input Operand Mux ------------------------------------------------------------------
-- -------------------------------------------------------------------------------------------
opa <= pc2_i when (ctrl_i(ctrl_alu_opa_mux_c) = '1') else rs1_i; -- operand a (first ALU input operand), only required for arithmetic ops
opa <= pc_i when (ctrl_i(ctrl_alu_opa_mux_c) = '1') else rs1_i; -- operand a (first ALU input operand), only required for arithmetic ops
opb <= imm_i when (ctrl_i(ctrl_alu_opb_mux_c) = '1') else rs2_i; -- operand b (second ALU input operand)


Expand All @@ -136,31 +135,55 @@ begin
op_a_v := (opa(opa'left) and (not ctrl_i(ctrl_alu_unsigned_c))) & opa;
op_b_v := (opb(opb'left) and (not ctrl_i(ctrl_alu_unsigned_c))) & opb;
-- add/sub(slt) select --
if (ctrl_i(ctrl_alu_addsub_c) = '1') then -- subtraction
if (ctrl_i(ctrl_alu_op0_c) = '1') then -- subtraction
op_y_v := not op_b_v;
cin_v(0) := '1';
else -- addition
op_y_v := op_b_v;
cin_v(0) := '0';
end if;
-- adder core (result + carry/borrow) --
-- adder core --
addsub_res <= std_ulogic_vector(unsigned(op_a_v) + unsigned(op_y_v) + unsigned(cin_v(0 downto 0)));
end process binary_arithmetic_core;

-- direct output of address result --
-- direct output of adder result --
add_o <= addsub_res(data_width_c-1 downto 0);

-- ALU arithmetic logic core --
arithmetic_core: process(ctrl_i, addsub_res)

-- ALU Operation Select -------------------------------------------------------------------
-- -------------------------------------------------------------------------------------------
alu_core: process(ctrl_i, addsub_res, rs1_i, opb)
begin
if (ctrl_i(ctrl_alu_arith_c) = alu_arith_cmd_addsub_c) then -- ADD/SUB
arith_res <= addsub_res(data_width_c-1 downto 0);
else -- SLT
arith_res <= (others => '0');
arith_res(0) <= addsub_res(addsub_res'left); -- => carry/borrow
end if;
end process arithmetic_core;
case ctrl_i(ctrl_alu_op2_c downto ctrl_alu_op0_c) is
when alu_op_add_c => alu_res <= addsub_res(data_width_c-1 downto 0); -- (default)
when alu_op_sub_c => alu_res <= addsub_res(data_width_c-1 downto 0);
-- when alu_op_mova_c => alu_res <= rs1_i; -- FIXME
when alu_op_slt_c => alu_res <= (others => '0'); alu_res(0) <= addsub_res(addsub_res'left); -- => carry/borrow
when alu_op_movb_c => alu_res <= opb;
when alu_op_xor_c => alu_res <= rs1_i xor opb; -- only rs1 required for logic ops (opa would also contain pc)
when alu_op_or_c => alu_res <= rs1_i or opb;
when alu_op_and_c => alu_res <= rs1_i and opb;
when others => alu_res <= addsub_res(data_width_c-1 downto 0);
end case;
end process alu_core;

-- ALU Function Select --------------------------------------------------------------------
-- -------------------------------------------------------------------------------------------
alu_function_mux: process(ctrl_i, alu_res, pc2_i, csr_i, cp_res)
begin
case ctrl_i(ctrl_alu_func1_c downto ctrl_alu_func0_c) is
when alu_func_core_c => res_o <= alu_res; -- (default)
when alu_func_nxpc_c => res_o <= pc2_i;
when alu_func_csrr_c => res_o <= csr_i;
when alu_func_copro_c => res_o <= cp_res;
when others => res_o <= alu_res; -- undefined
end case;
end process alu_function_mux;


-- **************************************************************************************************************************
-- Co-Processors
-- **************************************************************************************************************************

-- Co-Processor Arbiter -------------------------------------------------------------------
-- -------------------------------------------------------------------------------------------
Expand Down Expand Up @@ -193,7 +216,7 @@ begin
end process cp_arbiter;

-- is co-processor operation? --
cp_ctrl.cmd <= '1' when (ctrl_i(ctrl_alu_func1_c downto ctrl_alu_func0_c) = alu_func_cmd_copro_c) else '0';
cp_ctrl.cmd <= '1' when (ctrl_i(ctrl_alu_func1_c downto ctrl_alu_func0_c) = alu_func_copro_c) else '0';
cp_ctrl.start <= '1' when (cp_ctrl.cmd = '1') and (cp_ctrl.cmd_ff = '0') else '0';

-- co-processor select / star trigger --
Expand All @@ -209,38 +232,6 @@ begin
cp_res <= cp_result(0) or cp_result(1) or cp_result(2) or cp_result(3);


-- ALU Logic Core -------------------------------------------------------------------------
-- -------------------------------------------------------------------------------------------
alu_logic_core: process(ctrl_i, rs1_i, opb)
begin
case ctrl_i(ctrl_alu_logic1_c downto ctrl_alu_logic0_c) is
when alu_logic_cmd_movb_c => logic_res <= opb; -- (default)
when alu_logic_cmd_xor_c => logic_res <= rs1_i xor opb; -- only rs1 required for logic ops (opa would also contain pc)
when alu_logic_cmd_or_c => logic_res <= rs1_i or opb;
when alu_logic_cmd_and_c => logic_res <= rs1_i and opb;
when others => logic_res <= opb; -- undefined
end case;
end process alu_logic_core;


-- ALU Function Select --------------------------------------------------------------------
-- -------------------------------------------------------------------------------------------
alu_function_mux: process(ctrl_i, arith_res, logic_res, csr_i, cp_res)
begin
case ctrl_i(ctrl_alu_func1_c downto ctrl_alu_func0_c) is
when alu_func_cmd_arith_c => res_o <= arith_res; -- (default)
when alu_func_cmd_logic_c => res_o <= logic_res;
when alu_func_cmd_csrr_c => res_o <= csr_i;
when alu_func_cmd_copro_c => res_o <= cp_res;
when others => res_o <= arith_res; -- undefined
end case;
end process alu_function_mux;


-- **************************************************************************************************************************
-- Co-Processors
-- **************************************************************************************************************************

-- Co-Processor 0: Shifter (CPU Core ISA) --------------------------------------------------
-- -------------------------------------------------------------------------------------------
neorv32_cpu_cp_shifter_inst: neorv32_cpu_cp_shifter
Expand Down
Loading