Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Add optional external bus interface cache (XCACHE) #849

Merged
merged 12 commits into from
Mar 14, 2024
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ mimpid = 0x01040312 -> Version 01.04.03.12 -> v1.4.3.12

| Date | Version | Comment | Link |
|:----:|:-------:|:--------|:----:|
| 14.03.2024 | 1.9.6.5 | :sparkles: add optional external bus interface cache (XCACHE) | [#]846(https://github.com/stnolting/neorv32/pull/849) |
| 12.03.2024 | 1.9.6.4 | :warning: :warning: rename external bus/memory interface and according generics ("WISHBONE/MEM_EXT" -> "XBUS"); also rename bus interface ports (`wb_* -> xbus_*`) | [#846](https://github.com/stnolting/neorv32/pull/846) |
| 11.03.2024 | 1.9.6.3 | :warning: remove Wishbone tag signal; minor rtl edits and optimizations | [#845](https://github.com/stnolting/neorv32/pull/845) |
| 10.03.2024 | 1.9.6.2 | minor rtl clean-ups, optimizations and fixes | [#843](https://github.com/stnolting/neorv32/pull/843) |
Expand Down
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -149,7 +149,7 @@ allows booting application code via UART or from external SPI flash

* standard serial interfaces
([UART](https://stnolting.github.io/neorv32/#_primary_universal_asynchronous_receiver_and_transmitter_uart0),
[SPI](https://stnolting.github.io/neorv32/#_serial_peripheral_interface_controller_spi) (host),
[SPI](https://stnolting.github.io/neorv32/#_serial_peripheral_interface_controller_spi) (SPI host),
[SDI](https://stnolting.github.io/neorv32/#_serial_data_interface_controller_sdi) (SPI device),
[TWI/I²C](https://stnolting.github.io/neorv32/#_two_wire_serial_interface_controller_twi),
[ONEWIRE/1-Wire](https://stnolting.github.io/neorv32/#_one_wire_serial_interface_controller_onewire))
Expand All @@ -160,7 +160,7 @@ allows booting application code via UART or from external SPI flash
**SoC Connectivity**

* 32-bit external bus interface - Wishbone b4 compatible
([XBUS](https://stnolting.github.io/neorv32/#_processor_external_bus_interface_xbus));
([XBUS](https://stnolting.github.io/neorv32/#_processor_external_bus_interface_xbus)) with optional cache (XCACHE);
[wrappers](https://github.com/stnolting/neorv32/blob/main/rtl/system_integration) for AXI4-Lite and Avalon-MM host interfaces
* stream link interface with independent RX and TX channels - AXI4-Stream compatible
([SLINK](https://stnolting.github.io/neorv32/#_stream_link_interface_slink))
Expand Down
3 changes: 2 additions & 1 deletion docs/datasheet/overview.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -210,6 +210,7 @@ All core VHDL files from the list below have to be assigned to a **new library**
├neorv32_cfs.vhd - Custom functions subsystem
├neorv32_crc.vhd - Cyclic redundancy check unit
├neorv32_cache.vhd - Generic cache module
├neorv32_dcache.vhd - Processor-internal data cache
├neorv32_debug_dm.vhd - on-chip debugger: debug module
├neorv32_debug_dtm.vhd - on-chip debugger: debug transfer module
Expand All @@ -231,7 +232,7 @@ All core VHDL files from the list below have to be assigned to a **new library**
├neorv32_twi.vhd - Two wire serial interface controller
├neorv32_uart.vhd - Universal async. receiver/transmitter
├neorv32_wdt.vhd - Watchdog timer
neorv32_wishbone.vhd - External (Wishbone) bus interface
neorv32_xbus.vhd - External (Wishbone) bus interface gateways
├neorv32_xip.vhd - Execute in place module
├neorv32_xirq.vhd - External interrupt controller
Expand Down
10 changes: 6 additions & 4 deletions docs/datasheet/soc.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ image::neorv32_processor.png[align=center]
**Key Features**

* _optional_ processor-internal data and instruction memories (<<_data_memory_dmem,**DMEM**>>/<<_instruction_memory_imem,**IMEM**>>)
* _optional_ caches (<<_processor_internal_instruction_cache_icache,**iCACHE**>>/<<_processor_internal_data_cache_dcache,**dCACHE**>>)
* _optional_ caches (<<_processor_internal_instruction_cache_icache,**iCACHE**>>, <<_processor_internal_data_cache_dcache,**dCACHE**>, <<_execute_in_place_module_xip,**xipCACHE**>, <<_processor_external_bus_interface_xbus,**xCACHE**>>)
* _optional_ internal bootloader (<<_bootloader_rom_bootrom,**BOOTROM**>>) with UART console & SPI flash boot option
* _optional_ machine system timer (<<_machine_system_timer_mtime,**MTIME**>>), RISC-V-compatible
* _optional_ two independent universal asynchronous receivers and transmitters (<<_primary_universal_asynchronous_receiver_and_transmitter_uart0,**UART0**>>,
Expand Down Expand Up @@ -242,12 +242,12 @@ The generic type "`suv(x:y)`" is an abbreviation for "`std_ulogic_vector(x downt
| `MEM_INT_DMEM_SIZE` | natural | 8*1024 | Size in bytes of the processor-internal data memory (use a power of 2).
4+^| **<<_processor_internal_instruction_cache_icache>>**
| `ICACHE_EN` | boolean | false | Implement the instruction cache.
| `ICACHE_NUM_BLOCKS` | natural | 4 | Number of blocks ("pages" or "lines") Has to be a power of two.
| `ICACHE_NUM_BLOCKS` | natural | 4 | Number of blocks ("lines") Has to be a power of two.
| `ICACHE_BLOCK_SIZE` | natural | 64 | Size in bytes of each block. Has to be a power of two.
| `ICACHE_ASSOCIATIVITY` | natural | 1 | Associativity (number of sets). Allowed configurations: `1` = 1 set, direct mapped; `2` = 2-way set-associative.
4+^| **<<_processor_internal_data_cache_dcache>>**
| `DCACHE_EN` | boolean | false | Implement the data cache.
| `DCACHE_NUM_BLOCKS` | natural | 4 | Number of blocks ("pages" or "lines"). Has to be a power of two.
| `DCACHE_NUM_BLOCKS` | natural | 4 | Number of blocks ("lines"). Has to be a power of two.
| `DCACHE_BLOCK_SIZE` | natural | 64 | Size in bytes of each block. Has to be a power of two.
4+^| **<<_processor_external_bus_interface_xbus>> (Wishbone b4 protocol)**
| `XBUS_EN` | boolean | false | Implement the external bus interface.
Expand All @@ -256,6 +256,9 @@ The generic type "`suv(x:y)`" is an abbreviation for "`std_ulogic_vector(x downt
| `XBUS_BIG_ENDIAN` | boolean | false | Use BIG endian data order interface for external bus.
| `XBUS_ASYNC_RX` | boolean | false | Disable input registers when true.
| `XBUS_ASYNC_TX` | boolean | false | Disable output registers when true.
| `XBUS_CACHE_EN` | boolean | false | Implement the external bus cache.
| `XBUS_CACHE_NUM_BLOCKS` | natural | 64 | Number of blocks ("lines"). Has to be a power of two.
| `XBUS_CACHE_BLOCK_SIZE` | natural | 32 | Size in bytes of each block. Has to be a power of two.
4+^| **<<_execute_in_place_module_xip>>**
| `XIP_EN` | boolean | false | Implement the execute in-place module.
| `XIP_CACHE_EN` | boolean | false | Implement XIP cache.
Expand Down Expand Up @@ -558,7 +561,6 @@ Accesses that are delegated to the external bus interface have a different maxim
explicit specific processor generic. See section <<_processor_external_bus_interface_xbus>> for more information.



:sectnums:
==== Reservation Set Controller

Expand Down
82 changes: 60 additions & 22 deletions docs/datasheet/soc_xbus.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -5,30 +5,35 @@
[cols="<3,<3,<4"]
[frame="topbot",grid="none"]
|=======================
| Hardware source file(s): | neorv32_xbus.vhd |
| Software driver file(s): | none | _implicitly used_
| Top entity port: | `xbus_adr_o` | address output (32-bit)
| | `xbus_dat_i` | data input (32-bit)
| | `xbus_dat_o` | data output (32-bit)
| | `xbus_we_o` | write enable (1-bit)
| | `xbus_sel_o` | byte enable (4-bit)
| | `xbus_stb_o` | strobe (1-bit)
| | `xbus_cyc_o` | valid cycle (1-bit)
| | `xbus_ack_i` | acknowledge (1-bit)
| | `xbus_err_i` | bus error (1-bit)
| Configuration generics: | `XBUS_EN` | enable external bus interface when `true`
| | `XBUS_TIMEOUT` | number of clock cycles after which an unacknowledged external bus access will auto-terminate (0 = disabled)
| | `XBUS_PIPE_MODE` | when `false` (default): classic/standard Wishbone protocol; when `true`: pipelined Wishbone protocol
| | `XBUS_BIG_ENDIAN` | byte-order (Endianness) of external bus interface; `true`=BIG, `false`=little (default)
| | `XBUS_ASYNC_RX` | use registered RX path when `false` (default); use async/direct RX path when `true`
| | `XBUS_ASYNC_TX` | use registered TX path when `false` (default); use async/direct TX path when `true`
| Hardware source file(s): | neorv32_xbus.vhd | External bus gateway
| | neorv32_cache.vhd | External bus cache instance
| Software driver file(s): | none | _implicitly used_
| Top entity port(s): | `xbus_adr_o` | address output (32-bit)
| | `xbus_dat_i` | data input (32-bit)
| | `xbus_dat_o` | data output (32-bit)
| | `xbus_we_o` | write enable (1-bit)
| | `xbus_sel_o` | byte enable (4-bit)
| | `xbus_stb_o` | strobe (1-bit)
| | `xbus_cyc_o` | valid cycle (1-bit)
| | `xbus_ack_i` | acknowledge (1-bit)
| | `xbus_err_i` | bus error (1-bit)
| Configuration generics: | `XBUS_EN` | enable external bus interface when `true`
| | `XBUS_TIMEOUT` | number of clock cycles after which an unacknowledged external bus access will auto-terminate (0 = disabled)
| | `XBUS_PIPE_MODE` | when `false` (default): classic/standard Wishbone protocol; when `true`: pipelined Wishbone protocol
| | `XBUS_BIG_ENDIAN` | byte-order (Endianness) of external bus interface; `true`=BIG, `false`=little (default)
| | `XBUS_ASYNC_RX` | use registered RX path when `false` (default); use async/direct RX path when `true`
| | `XBUS_ASYNC_TX` | use registered TX path when `false` (default); use async/direct TX path when `true`
| | `XBUS_CACHE_EN` | implement the external bus cache
| | `XBUS_CACHE_NUM_BLOCKS` | number of blocks ("lines"), has to be a power of two.
| | `XBUS_CACHE_BLOCK_SIZE` | size in bytes of each block, has to be a power of two.
| CPU interrupts: | none |
|=======================


The external bus interface provides a Wishbone b4-compatible on-chip bus interface that is
implemented if the `XBUS_EN` generic is `true`. This bus interface can be used to attach external memories,
custom hardware accelerators, additional peripheral devices or all other kinds of IP blocks.
An optional cache module ("XCACHE") can be enabled to improve memory access latency.

The external interface is **not** mapped to a specific address space. Instead, all CPU memory accesses that
do not target a specific (and actually implemented) processor-internal address region (hence, accessing the "void";
Expand Down Expand Up @@ -95,18 +100,51 @@ SYSINFO module (see section <<_system_configuration_information_memory_sysinfo>>

**Access Latency**

By default, the XBUS gateway introduces two additional latency cycles: processor-outgoing (`*_o`) and
By default, the XBUS gateway introduces two additional latency cycles since processor-outgoing (`*_o`) and
processor-incoming (`*_i`) signals are fully registered. Thus, any access from the CPU to a processor-external devices
via Wishbone requires 2 additional clock cycles. This can ease timing closure when using large (combinatorial) Wishbone
interconnection networks.
via the XBUS interface requires 2 additional clock cycles. This can ease timing closure when using large (combinatorial)
processor-external interconnection networks.

Optionally, the latency of the XBUS gateway can be reduced by removing the input and output register stages.
Optionally, the latency of the XBUS gateway can be reduced by removing the input and/or output register stages.
Enabling the `XBUS_ASYNC_RX` option will remove the input register stage; enabling `XBUS_ASYNC_TX` option will
remove the output register stages. Each enabled option reduces access latency by 1 cycle.
remove the output register stages. Note that using those "async" options might impact timing closure.

.Output Gating
[NOTE]
All outgoing Wishbone signals use a "gating mechanism" so they only change if there is a actual XBUS transaction being in
progress. This can reduce dynamic switching activity in the external bus system and also simplifies simulation-based
inspection of the Wishbone transactions. Note that this output gating is only available if the output register buffer is not
disabled (`XBUS_ASYNC_TX` = `false`).


**External Bus Cache (X-CACHE)**

[source,asciiart]
---------------------------------------
Simplified cache architecture ("->" = direction of access requests):

Direct Access +----------+
/|-------------------------->| Register |------------------------->|\
| | +----------+ | |
Core ---->| | | |----> XBUS
| | +--------------+ +--------------+ +-------------+ | |
\|--->| Host Arbiter |---->| Cache Memory |<----| Bus Arbiter |--->|/
+--------------+ +--------------+ +-------------+
---------------------------------------

The XBUS interface provides an optional cache module that can be used to buffer and improve processor-external accesses.
The cache uses a direct-mapped architecture that implements "write-allocate" and "write-back" strategies.

The **write-allocate** strategy will fetch the entire referenced block from main memory when encountering
a cache write-miss. The **write-back** strategy will gather all writes locally inside the cache until the according
cache block is about to be replaced. In this case, the entire modified cache block is written back to main memory.

The x-cache is enabled via the `XBUS_CACHE_EN` generic. The total size of the cache is split into the number of cache lines
or cache blocks (`XBUS_CACHE_NUM_BLOCKS` generic) and the line or block size in bytes (`XBUS_CACHE_BLOCK_SIZE` generic).

The x-cache also provides "direct accesses" that bypass the cache. For example, this can be used to access processor-external
memory-mapped IO. All accesses that target the address range from `0xF0000000` to `0xFFFFFFFF` will always bypass the cache
(see section <<_address_space>>). Furthermore, load-reservate and store conditional <<_atomic_accesses>> will also always bypass the
cache **regardless of the accessed address**.


Binary file modified docs/figures/neorv32_bus.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
5 changes: 4 additions & 1 deletion rtl/core/neorv32_package.vhd
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ package neorv32_package is

-- Architecture Constants -----------------------------------------------------------------
-- -------------------------------------------------------------------------------------------
constant hw_version_c : std_ulogic_vector(31 downto 0) := x"01090604"; -- hardware version
constant hw_version_c : std_ulogic_vector(31 downto 0) := x"01090605"; -- hardware version
constant archid_c : natural := 19; -- official RISC-V architecture ID
constant XLEN : natural := 32; -- native data path width

Expand Down Expand Up @@ -791,6 +791,9 @@ package neorv32_package is
XBUS_BIG_ENDIAN : boolean := false;
XBUS_ASYNC_RX : boolean := false;
XBUS_ASYNC_TX : boolean := false;
XBUS_CACHE_EN : boolean := false;
XBUS_CACHE_NUM_BLOCKS : natural := 64;
XBUS_CACHE_BLOCK_SIZE : natural := 32;
-- Execute in-place module (XIP) --
XIP_EN : boolean := false;
XIP_CACHE_EN : boolean := false;
Expand Down
Loading