Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[rtl] Cleanups and Optimizations #660

Merged
merged 12 commits into from
Jul 29, 2023
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ mimpid = 0x01040312 -> Version 01.04.03.12 -> v1.4.3.12

| Date (*dd.mm.yyyy*) | Version | Comment |
|:-------------------:|:-------:|:--------|
| 29.07.2023 | 1.8.7.4 | RTL cleanup and optimizations (less synthesis warnings, less resource requirements); [#660](https://github.com/stnolting/neorv32/pull/660) |
| 28.07.2023 | 1.8.7.3 | :warning: reworked **SYSINFO** module; clean-up address space layout; clean-up assertion notes; [#659](https://github.com/stnolting/neorv32/pull/659) |
| 27.07.2023 | 1.8.7.2 | :bug: make sure that IMEM/DMEM size is always a power of two; [#658](https://github.com/stnolting/neorv32/pull/658) |
| 27.07.2023 | 1.8.7.1 | :warning: remove `CUSTOM_ID` generic; cleanup and re-layout `NEORV32_SYSINFO.SOC` bits; (:bug:) fix gateway's generics (`positive` -> `natural` as these generics are allowed to be zero); [#657](https://github.com/stnolting/neorv32/pull/657) |
Expand Down
4 changes: 2 additions & 2 deletions docs/datasheet/cpu.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -145,7 +145,8 @@ The control unit is split into a "front-end" and a "back-end".

The front-end is responsible for fetching instructions in chunks of 32-bits. This can be a single aligned 32-bit instruction,
two aligned 16-bit instructions or a mixture of those. The instructions including control and exception information are stored
to a FIFO queue - the instruction prefetch buffer (IPB). The depth of this FIFO can be configured by the `CPU_IPB_ENTRIES` top generic.
to a FIFO queue - the instruction prefetch buffer (IPB). This FIFO has a depth of two entries by default but can be customized
via the `ipb_depth_c` VHDL package constant.

The FIFO allows the front-end to do "speculative" instruction fetches, as it keeps fetching the next consecutive instruction
all the time. This also allows to decouple front-end (instruction fetch) and back-end (instruction execution) so both modules
Expand Down Expand Up @@ -695,7 +696,6 @@ Auto-increment of the HPMs can be deactivated individually via the <<_mcountinhi
This is a sub-extension of the <<_m_isa_extension>> ISA extension. It implements only the multiplication operations
of the `M` extensions and is intended for size-constrained setups that require hardware-based
integer multiplications but not hardware-based divisions, which will be computed entirely in software.
This extension requires only ~50% of the hardware utilization of the "full" `M` extension.


==== `Zxcfu` ISA Extension
Expand Down
23 changes: 14 additions & 9 deletions docs/datasheet/soc.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -221,7 +221,6 @@ The generic type "`suv(x:y)`" is an abbreviation for "`std_ulogic_vector(x downt
4+^| **CPU Tuning Options**
| `FAST_MUL_EN` | boolean | false | Implement fast (but large) full-parallel multipliers (trying to infer DSP blocks).
| `FAST_SHIFT_EN` | boolean | false | Implement fast (but large) full-parallel barrel shifters.
| `CPU_IPB_ENTRIES` | natural | 1 | Number of entries in the CPU's instruction prefetch buffer.
4+^| **Physical Memory Protection (<<_pmp_isa_extension>>)**
| `PMP_NUM_REGIONS` | natural | 0 | Number of implemented PMP regions (0..16).
| `PMP_MIN_GRANULARITY` | natural | 4 | Minimal region granularity in bytes. Has to be a power of two, min 4.
Expand Down Expand Up @@ -459,23 +458,29 @@ A pending FIRQ has to be explicitly cleared by writing zero to the according <<_

As a 32-bit architecture the NEORV32 can access a 4GB physical address space. By default, this address space is
split into six main regions. Each region provides specific _physical memory attributes_ ("PMAs") that define
the access capabilities.
the access capabilities (`rwxac`; `r` = read permission, `w` = execute permission, `x` - execute permission,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a typo slipped in here: execute -> write

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, my fault 😅
Thanks for finding this!

`a` = atomic access support, `c` = cached CPU access).

.NEORV32 Processor Address Space (Default Configuration)
image::address_space.png[900]

.Main Address Regions
[cols="<1,^4,^2,<7"]
[options="header",grid="rows"]
|=======================
| # | Region | PMAs | Description
| 1 | Internal IMEM address space | `rwx` | For instructions (=code) and constants; mapped to the internal <<_instruction_memory_imem>>.
| 2 | Internal DMEM address space | `rwx` | For application runtime data (heap, stack, etc.); mapped to the internal <<_data_memory_dmem>>).
| 3 | Memory-mapped XIP flash | `r-x` | Memory-mapped access to the <<_execute_in_place_module_xip>> SPI flash.
| 4 | Bootloader address space | `r-x` | Read-only memory for the internal <<_bootloader_rom_bootrom>> containing the default <<_bootloader>>.
| 5 | IO/peripheral address space | `rwx` | Processor-internal peripherals / IO devices.
| 6 | The "**void**" | `rwx` | Unmapped address space. All accesses to this region(s) are redirected to the <<_processor_external_memory_interface_wishbone>> (if implemented).
| # | Region | PMAs | Description
| 1 | Internal IMEM address space | `rwxac` | For instructions (=code) and constants; mapped to the internal <<_instruction_memory_imem>>.
| 2 | Internal DMEM address space | `rwxac` | For application runtime data (heap, stack, etc.); mapped to the internal <<_data_memory_dmem>>).
| 3 | Memory-mapped XIP flash | `r-xac` | Memory-mapped access to the <<_execute_in_place_module_xip>> SPI flash.
| 4 | Bootloader address space | `r-xa-` | Read-only memory for the internal <<_bootloader_rom_bootrom>> containing the default <<_bootloader>>.
| 5 | IO/peripheral address space | `rwxa-` | Processor-internal peripherals / IO devices.
| 6 | The "**void**" | `rwxac` | Unmapped address space. All accesses to this region(s) are redirected to the <<_processor_external_memory_interface_wishbone>> (if implemented).
|=======================

.Custom PMAs
[NOTE]
Physical memory attributes can be customized (constrained) using the CPU's <<_pmp_isa_extension>>.

The CPU can access all of the 32-bit address space from the instruction fetch interface and also from the data access
interface. Both interfaces can be equipped with optional caches (<<_processor_internal_data_cache_dcache>> and
<<_processor_internal_instruction_cache_icache>>). The two CPU interfaces are multiplexed by a simple bus switch into
Expand Down
2 changes: 2 additions & 0 deletions docs/datasheet/software.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -265,6 +265,8 @@ The following default compiler flags are used for compiling an application. Thes
| `-lgcc` | Make sure we have no unresolved references to internal GCC library subroutines.
| `-mno-fdiv` | Use built-in software functions for floating-point divisions and square roots (since the according instructions are not supported yet).
| `-g` | Include debugging information/symbols in ELF.
| `-mstrict-align` | Unaligned memory accesses cannot be resolved by the hardware and require emulation.
| `-mbranch-cost=...` | Branches cost a lot cycles on a multi-cycle architecture.
|=======================

:sectnums:
Expand Down
Binary file modified docs/figures/address_space.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
3 changes: 0 additions & 3 deletions docs/userguide/application_specific_configuration.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -23,8 +23,6 @@ multiplications, `FAST_SHIFT_EN => true` use a fast barrel shifter for shift ope
* Implement the instruction cache: `ICACHE_EN => true`
* Use as many _internal_ memory as possible to reduce memory access latency: `MEM_INT_IMEM_EN => true` and
`MEM_INT_DMEM_EN => true`, maximize `MEM_INT_IMEM_SIZE` and `MEM_INT_DMEM_SIZE`
* Increase the CPU's instruction prefetch buffer size: if **no** instruction cache is implemented `CPU_IPB_ENTRIES` should be
quite large
* _To be continued..._


Expand Down Expand Up @@ -53,7 +51,6 @@ also reduces program code size by approximately 30%.
* If not explicitly used/required, exclude the CPU standard counters `[m]instret[h]`
(number of instruction) and `[m]cycle[h]` (number of cycles) from synthesis by disabling the `Zicntr` ISA extension
(note, this is not RISC-V compliant).
* Reduce the CPU's prefetch buffer size (`CPU_IPB_ENTRIES`) to its minimum (=1).
* Map CPU shift operations to a small and iterative shifter unit (`FAST_SHIFT_EN => false`).
* If you have unused DSP block available, you can map multiplication operations to those slices instead of
using LUTs to implement the multiplier (`FAST_MUL_EN => true`).
Expand Down
26 changes: 7 additions & 19 deletions rtl/core/neorv32_cpu.vhd
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,6 @@ entity neorv32_cpu is
-- Extension Options --
FAST_MUL_EN : boolean; -- use DSPs for M extension's multiplier
FAST_SHIFT_EN : boolean; -- use barrel shifter for shift operations
CPU_IPB_ENTRIES : natural; -- entries in instruction prefetch buffer, has to be a power of 2, min 1
-- Physical Memory Protection (PMP) --
PMP_NUM_REGIONS : natural; -- number of regions (0..16)
PMP_MIN_GRANULARITY : natural; -- minimal region granularity in bytes, has to be a power of 2, min 4 bytes
Expand Down Expand Up @@ -99,14 +98,10 @@ end neorv32_cpu;

architecture neorv32_cpu_rtl of neorv32_cpu is

-- local constants: additional register file read ports --
-- auto-configuration --
constant regfile_rs3_en_c : boolean := CPU_EXTENSION_RISCV_Zxcfu or CPU_EXTENSION_RISCV_Zfinx; -- 3rd register file read port (rs3)
constant regfile_rs4_en_c : boolean := CPU_EXTENSION_RISCV_Zxcfu; -- 4th register file read port (rs4)

-- local constant: instruction prefetch buffer depth --
constant ipb_override_c : boolean := (CPU_EXTENSION_RISCV_C = true) and (CPU_IPB_ENTRIES < 2); -- override IPB size: set to 2?
constant ipb_depth_c : natural := cond_sel_natural_f(ipb_override_c, 2, CPU_IPB_ENTRIES);

-- local signals --
signal ctrl : ctrl_bus_t; -- main control bus
signal imm : std_ulogic_vector(XLEN-1 downto 0); -- immediate
Expand All @@ -120,7 +115,7 @@ architecture neorv32_cpu_rtl of neorv32_cpu is
signal mem_rdata : std_ulogic_vector(XLEN-1 downto 0); -- memory read data
signal cp_done : std_ulogic; -- ALU co-processor operation done
signal alu_exc : std_ulogic; -- ALU exception
signal bus_d_wait : std_ulogic; -- wait for current bus data access
signal bus_d_wait : std_ulogic; -- wait for current data bus access
signal csr_rdata : std_ulogic_vector(XLEN-1 downto 0); -- csr read data
signal mar : std_ulogic_vector(XLEN-1 downto 0); -- memory address register
signal ma_load : std_ulogic; -- misaligned load data address
Expand All @@ -143,7 +138,7 @@ begin
-- -------------------------------------------------------------------------------------------
-- say hello --
assert false report
"The NEORV32 RISC-V Processor (Version 0x" & to_hstring32_f(hw_version_c) & ") - github.com/stnolting/neorv32" severity note;
"The NEORV32 RISC-V Processor Version 0x" & to_hstring32_f(hw_version_c) & " - github.com/stnolting/neorv32" severity note;

-- CPU ISA configuration --
assert false report
Expand Down Expand Up @@ -175,12 +170,6 @@ begin
assert not (CPU_BOOT_ADDR(1 downto 0) /= "00") report
"NEORV32 CPU CONFIG ERROR! <CPU_BOOT_ADDR> has to be 32-bit aligned." severity error;

-- Instruction prefetch buffer --
assert not (is_power_of_two_f(CPU_IPB_ENTRIES) = false) report
"NEORV32 CPU CONFIG ERROR! Number of entries in instruction prefetch buffer <CPU_IPB_ENTRIES> has to be a power of two." severity error;
assert not (ipb_override_c = true) report
"NEORV32 CPU CONFIG WARNING! Overriding <CPU_IPB_ENTRIES> configuration (setting =2) because C ISA extension is enabled." severity warning;

-- PMP --
assert not (PMP_NUM_REGIONS > 16) report
"NEORV32 CPU CONFIG ERROR! Number of PMP regions <PMP_NUM_REGIONS> out of valid range (0..16)." severity error;
Expand Down Expand Up @@ -233,7 +222,6 @@ begin
-- Tuning Options --
FAST_MUL_EN => FAST_MUL_EN, -- use DSPs for M extension's multiplier
FAST_SHIFT_EN => FAST_SHIFT_EN, -- use barrel shifter for shift operations
CPU_IPB_ENTRIES => ipb_depth_c, -- entries is instruction prefetch buffer, has to be a power of 2, min 1
-- Physical memory protection (PMP) --
PMP_NUM_REGIONS => PMP_NUM_REGIONS, -- number of regions (0..16)
PMP_MIN_GRANULARITY => PMP_MIN_GRANULARITY, -- minimal region granularity in bytes, has to be a power of 2, min 4 bytes
Expand Down Expand Up @@ -323,10 +311,10 @@ begin
csr_i => csr_rdata, -- CSR read data
pc2_i => next_pc, -- next PC
-- data output --
rs1_o => rs1, -- operand 1
rs2_o => rs2, -- operand 2
rs3_o => rs3, -- operand 3
rs4_o => rs4 -- operand 4
rs1_o => rs1, -- rs1
rs2_o => rs2, -- rs2
rs3_o => rs3, -- rs3
rs4_o => rs4 -- rs4
);


Expand Down
Loading
Loading