Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rework cache system #853

Merged
merged 13 commits into from
Mar 17, 2024
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ mimpid = 0x01040312 -> Version 01.04.03.12 -> v1.4.3.12

| Date | Version | Comment | Link |
|:----:|:-------:|:--------|:----:|
| 16.03.2024 | 1.9.6.8 | rework cache system: L1 + L2 caches, all based on the generic cache component | [#853](https://github.com/stnolting/neorv32/pull/853) |
| 16.03.2024 | 1.9.6.7 | cache optimizations: add read-only option, add option to disable direct/uncached accesses | [#851](https://github.com/stnolting/neorv32/pull/851) |
| 15.03.2024 | 1.9.6.6 | :warning: clean-up configuration generics (remove XBUS endianness configuration; refine JEDED/VENDORID configuration); rearrange SYSINFO.SOC bits | [#850](https://github.com/stnolting/neorv32/pull/850) |
| 14.03.2024 | 1.9.6.5 | :sparkles: add optional external bus interface cache (XCACHE) | [#]846(https://github.com/stnolting/neorv32/pull/849) |
Expand Down
2 changes: 0 additions & 2 deletions docs/datasheet/overview.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -211,14 +211,12 @@ All core VHDL files from the list below have to be assigned to a **new library**
├neorv32_cfs.vhd - Custom functions subsystem
├neorv32_crc.vhd - Cyclic redundancy check unit
├neorv32_cache.vhd - Generic cache module
├neorv32_dcache.vhd - Processor-internal data cache
├neorv32_debug_dm.vhd - on-chip debugger: debug module
├neorv32_debug_dtm.vhd - on-chip debugger: debug transfer module
├neorv32_dma.vhd - Direct memory access controller
├neorv32_dmem.entity.vhd - Processor-internal data memory (entity-only!)
├neorv32_gpio.vhd - General purpose input/output port unit
├neorv32_gptmr.vhd - General purpose 32-bit timer
├neorv32_icache.vhd - Processor-internal instruction cache
├neorv32_intercon.vhd - SoC bus infrastructure
├neorv32_mtime.vhd - Machine system timer
├neorv32_neoled.vhd - NeoPixel (TM) compatible smart LED interface
Expand Down
1 change: 0 additions & 1 deletion docs/datasheet/soc.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -244,7 +244,6 @@ The generic type "`suv(x:y)`" is an abbreviation for "`std_ulogic_vector(x downt
| `ICACHE_EN` | boolean | false | Implement the instruction cache.
| `ICACHE_NUM_BLOCKS` | natural | 4 | Number of blocks ("lines") Has to be a power of two.
| `ICACHE_BLOCK_SIZE` | natural | 64 | Size in bytes of each block. Has to be a power of two.
| `ICACHE_ASSOCIATIVITY` | natural | 1 | Associativity (number of sets). Allowed configurations: `1` = 1 set, direct mapped; `2` = 2-way set-associative.
4+^| **<<_processor_internal_data_cache_dcache>>**
| `DCACHE_EN` | boolean | false | Implement the data cache.
| `DCACHE_NUM_BLOCKS` | natural | 4 | Number of blocks ("lines"). Has to be a power of two.
Expand Down
29 changes: 11 additions & 18 deletions docs/datasheet/soc_dcache.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -5,40 +5,33 @@
[cols="<3,<3,<4"]
[frame="topbot",grid="none"]
|=======================
| Hardware source file(s): | neorv32_dcache.vhd |
| Hardware source file(s): | neorv32_cache.vhd | Generic cache module
| Software driver file(s): | none | _implicitly used_
| Top entity port: | none |
| Top entity port: | none |
| Configuration generics: | `DCACHE_EN` | implement processor-internal data cache when `true`
| | `DCACHE_NUM_BLOCKS` | number of cache blocks (pages/lines)
| | `DCACHE_BLOCK_SIZE` | size of a cache block in bytes
| CPU interrupts: | none |
| CPU interrupts: | none |
|=======================

The processor features an optional data cache to improve performance when using memories with high
access latencies. The cache is directly connected to the CPU's data access interface and provides
full-transparent buffering.

The cache is implemented if the `DCACHE_EN` generic is `true`. The size of the cache memory is defined via the
`DCACHE_BLOCK_SIZE` (the size of a single cache block/page/line in bytes; has to be a power of two and greater than or
equal to 4 bytes) and `DCACHE_NUM_BLOCKS` (the total amount of cache blocks; has to be a power of two and greater than or
equal to 1) generics. The data cache provides only a single set, hence it is direct-mapped.


**Cached/Uncached Accesses**
access latencies. The cache is connected directly to the CPU's data access interface and provides
full-transparent accesses. The cache is direct-mapped and uses "write-allocate" and "write-back" strategies.

.Cached/Uncached Accesses
[NOTE]
The data cache provides direct accesses (= uncached) to memory in order to access memory-mapped IO (like the
processor-internal IO/peripheral modules). All accesses that target the address range from `0xF0000000` to `0xFFFFFFFF`
will not be cached at all (see section <<_address_space>>).

.Caching Internal Memories
[NOTE]
The data cache is intended to accelerate data access to **processor-external** memories
(via the external bus interface or via the XIP module). The cache(s) should not be implemented
when using only processor-internal data and instruction memories.
The data cache is intended to accelerate data access to **processor-external** memories.
The CPU cache(s) should not be implemented when using only processor-internal data and instruction memories.

.Manual Cache Clear/Reload
.Manual Cache Flush/Clear/Reload
[NOTE]
By executing the `fence(.i)` instruction the cache is cleared and a reload from main memory is triggered.
By executing the `fence(.i)` instruction the cache is flushed, cleared and a reload from main memory is triggered.

.Retrieve Cache Configuration from Software
[TIP]
Expand Down
36 changes: 13 additions & 23 deletions docs/datasheet/soc_icache.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -5,39 +5,29 @@
[cols="<3,<3,<4"]
[frame="topbot",grid="none"]
|=======================
| Hardware source file(s): | neorv32_icache.vhd |
| Software driver file(s): | none | _implicitly used_
| Top entity port: | none |
| Configuration generics: | `ICACHE_EN` | implement processor-internal instruction cache when `true`
| | `ICACHE_NUM_BLOCKS` | number of cache blocks (pages/lines)
| | `ICACHE_BLOCK_SIZE` | size of a cache block in bytes
| | `ICACHE_ASSOCIATIVITY` | associativity / number of sets
| CPU interrupts: | none |
| Hardware source file(s): | neorv32_cache.vhd | Generic cache module
| Software driver file(s): | none | _implicitly used_
| Top entity port: | none |
| Configuration generics: | `ICACHE_EN` | implement processor-internal instruction cache when `true`
| | `ICACHE_NUM_BLOCKS` | number of cache blocks (pages/lines)
| | `ICACHE_BLOCK_SIZE` | size of a cache block in bytes
| CPU interrupts: | none |
|=======================

The processor features an optional instruction cache to improve performance when using memories with high
access latencies. The cache is directly connected to the CPU's instruction fetch interface and provides
full-transparent buffering of instruction fetch accesses to the **entire address space**.

The cache is implemented if the `ICACHE_EN` generic is `true`. The size of the cache memory is defined via
`ICACHE_BLOCK_SIZE` (the size of a single cache block/page/line in bytes; has to be a power of two and greater than or
equal to 4 bytes), `ICACHE_NUM_BLOCKS` (the total amount of cache blocks; has to be a power of two and greater than or
equal to 1) and the actual cache associativity `ICACHE_ASSOCIATIVITY` (number of sets; 1 = direct-mapped, 2 = 2-way
set-associative) generics. If the cache associativity is greater than one the LRU replacement policy (least recently
used) is used.


**Cached/Uncached Accesses**
access latencies. The cache is connected directly to the CPU's instruction fetch interface and provides
full-transparent accesses. The cache is direct-mapped and read-only.

.Cached/Uncached Accesses
[NOTE]
The data cache provides direct accesses (= uncached) to memory in order to access memory-mapped IO (like the
processor-internal IO/peripheral modules). All accesses that target the address range from `0xF0000000` to `0xFFFFFFFF`
will not be cached at all (see section <<_address_space>>).

.Caching Internal Memories
[NOTE]
The instruction cache is intended to accelerate instruction fetches from **processor-external** memories
(via the external bus interface or via the XIP module). The cache(s) should not be implemented
when using only processor-internal data and instruction memories.
The data cache is intended to accelerate data access to **processor-external** memories.
The CPU cache(s) should not be implemented when using only processor-internal data and instruction memories.

.Manual Cache Clear/Reload
[NOTE]
Expand Down
19 changes: 10 additions & 9 deletions docs/datasheet/soc_sysinfo.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,8 @@ Bit fields in this register are set to all-zero if the according cache is not im
| `7` | `SYSINFO_SOC_CLOCK_GATING` | set if CPU clock gating is implemented (via top's `CLOCK_GATING_EN` generic)
| `8` | `SYSINFO_SOC_XBUS_CACHE` | set if external bus interface cache is implemented (via top's `XBUS_CACHE_EN` generic)
| `9` | `SYSINFO_SOC_XIP` | set if XIP module is implemented (via top's `XIP_EN` generic)
| `13:10` | - | _reserved_, read as zero
| `10` | `SYSINFO_SOC_XIP_CACHE` | set if XIP cache is implemented (via top's `XIP_CACHE_EN` generic)
| `13:11` | - | _reserved_, read as zero
| `14` | `SYSINFO_SOC_IO_DMA` | set if direct memory access controller is implemented (via top's `IO_DMA_EN` generic)
| `15` | `SYSINFO_SOC_IO_GPIO` | set if GPIO is implemented (via top's `IO_GPIO_EN` generic)
| `16` | `SYSINFO_SOC_IO_MTIME` | set if MTIME is implemented (via top's `IO_MTIME_EN` generic)
Expand Down Expand Up @@ -102,12 +103,12 @@ Bit fields in this register are set to all-zero if the according cache is not im
[options="header",grid="all"]
|=======================
| Bit | Name [C] | Function
| `3:0` | `SYSINFO_CACHE_IC_BLOCK_SIZE_3 : SYSINFO_CACHE_IC_BLOCK_SIZE_0` | _log2_(i-cache block size in bytes), via top's `ICACHE_BLOCK_SIZE` generic
| `7:4` | `SYSINFO_CACHE_IC_NUM_BLOCKS_3 : SYSINFO_CACHE_IC_NUM_BLOCKS_0` | _log2_(i-cache number of cache blocks), via top's `ICACHE_NUM_BLOCKS` generic
| `11:9` | `SYSINFO_CACHE_IC_ASSOCIATIVITY_3 : SYSINFO_CACHE_IC_ASSOCIATIVITY_0` | _log2_(i-cache associativity), via top's `ICACHE_ASSOCIATIVITY` generic
| `15:12` | `SYSINFO_CACHE_IC_REPLACEMENT_3 : SYSINFO_CACHE_IC_REPLACEMENT_0` | i-cache replacement policy (`0001` = LRU if associativity > 0)
| `19:16` | `SYSINFO_CACHE_DC_BLOCK_SIZE_3 : SYSINFO_CACHE_DC_BLOCK_SIZE_0` | _log2_(d-cache block size in bytes), via top's `DCACHE_BLOCK_SIZE` generic
| `23:20` | `SYSINFO_CACHE_DC_NUM_BLOCKS_3 : SYSINFO_CACHE_DC_NUM_BLOCKS_0` | _log2_(d-cache number of cache blocks), via top's `DCACHE_NUM_BLOCKS` generic
| `27:24` | `SYSINFO_CACHE_DC_ASSOCIATIVITY_3 : SYSINFO_CACHE_DC_ASSOCIATIVITY_0` | always zero
| `31:28` | `SYSINFO_CACHE_DC_REPLACEMENT_3 : SYSINFO_CACHE_DC_REPLACEMENT_0` | always zero
| `3:0` | `SYSINFO_CACHE_INST_BLOCK_SIZE_3 : SYSINFO_CACHE_INST_BLOCK_SIZE_0` | _log2_(i-cache block size in bytes), via top's `ICACHE_BLOCK_SIZE` generic
| `7:4` | `SYSINFO_CACHE_INST_NUM_BLOCKS_3 : SYSINFO_CACHE_INST_NUM_BLOCKS_0` | _log2_(i-cache number of cache blocks), via top's `ICACHE_NUM_BLOCKS` generic
| `11:8` | `SYSINFO_CACHE_DATA_BLOCK_SIZE_3 : SYSINFO_CACHE_DATA_BLOCK_SIZE_0` | _log2_(d-cache block size in bytes), via top's `DCACHE_BLOCK_SIZE` generic
| `15:12` | `SYSINFO_CACHE_DATA_NUM_BLOCKS_3 : SYSINFO_CACHE_DATA_NUM_BLOCKS_0` | _log2_(d-cache number of cache blocks), via top's `DCACHE_NUM_BLOCKS` generic
| `19:16` | `SYSINFO_CACHE_XIP_BLOCK_SIZE_3 : SYSINFO_CACHE_XIP_BLOCK_SIZE_0` | _log2_(xip-cache block size in bytes), via top's `XIP_CACHE_BLOCK_SIZE` generic
| `23:20` | `SYSINFO_CACHE_XIP_NUM_BLOCKS_3 : SYSINFO_CACHE_XIP_NUM_BLOCKS_0` | _log2_(xip-cache number of cache blocks), via top's `XIP_CACHE_NUM_BLOCKS` generic
| `27:24` | `SYSINFO_CACHE_XBUS_BLOCK_SIZE_3 : SYSINFO_CACHE_XBUS_BLOCK_SIZE_0` | _log2_(xbus-cache block size in bytes), via top's `XBUS_CACHE_BLOCK_SIZE` generic
| `31:28` | `SYSINFO_CACHE_XBUS_NUM_BLOCKS_3 : SYSINFO_CACHE_XBUS_NUM_BLOCKS_0` | _log2_(xbus-cache number of cache blocks), via top's `XBUS_CACHE_NUM_BLOCKS` generic
|=======================
2 changes: 1 addition & 1 deletion docs/datasheet/soc_xbus.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
[frame="topbot",grid="none"]
|=======================
| Hardware source file(s): | neorv32_xbus.vhd | External bus gateway
| | neorv32_cache.vhd | External bus cache instance
| | neorv32_cache.vhd | Generic cache module
| Software driver file(s): | none | _implicitly used_
| Top entity port(s): | `xbus_adr_o` | address output (32-bit)
| | `xbus_dat_i` | data input (32-bit)
Expand Down
24 changes: 12 additions & 12 deletions docs/datasheet/soc_xip.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -5,18 +5,19 @@
[cols="<3,<3,<4"]
[frame="topbot",grid="none"]
|=======================
| Hardware source file(s): | neorv32_xip.vhd |
| Software driver file(s): | neorv32_xip.c |
| | neorv32_xip.h |
| Top entity port: | `xip_csn_o` | 1-bit chip select, low-active
| | `xip_clk_o` | 1-bit serial clock output
| | `xip_dat_i` | 1-bit serial data input
| | `xip_dat_o` | 1-bit serial data output
| Hardware source file(s): | neorv32_xip.vhd | XIP module
| | neorv32_cache.vhd | Generic cache module
| Software driver file(s): | neorv32_xip.c |
| | neorv32_xip.h |
| Top entity port: | `xip_csn_o` | 1-bit chip select, low-active
| | `xip_clk_o` | 1-bit serial clock output
| | `xip_dat_i` | 1-bit serial data input
| | `xip_dat_o` | 1-bit serial data output
| Configuration generics: | `XIP_EN` | implement XIP module when `true`
| | `XIP_CACHE_EN` | implement XIP cache when `true`
| | `XIP_CACHE_NUM_BLOCKS` | number of blocks in XIP cache; has to be a power of two
| | `XIP_CACHE_BLOCK_SIZE` | number of bytes per XIP cache block; has to be a power of two, min 4
| CPU interrupts: | none |
| CPU interrupts: | none |
|=======================


Expand Down Expand Up @@ -190,7 +191,7 @@ The XIP cache is cleared when the XIP module is disabled (`XIP_CTRL_EN = 0`), wh
[options="header",grid="all"]
|=======================
| Address | Name [C] | Bit(s), Name [C] | R/W | Function
.15+<| `0xffffff40` .15+<| `CTRL` <|`0` `XIP_CTRL_EN` ^| r/w <| XIP module enable
.14+<| `0xffffff40` .14+<| `CTRL` <|`0` `XIP_CTRL_EN` ^| r/w <| XIP module enable
<|`3:1` `XIP_CTRL_PRSC2 : XIP_CTRL_PRSC0` ^| r/w <| 3-bit SPI clock prescaler select
<|`4` `XIP_CTRL_CPOL` ^| r/w <| SPI clock polarity
<|`5` `XIP_CTRL_CPHA` ^| r/w <| SPI clock phase
Expand All @@ -200,9 +201,8 @@ The XIP cache is cleared when the XIP module is disabled (`XIP_CTRL_EN = 0`), wh
<|`20:13` `XIP_CTRL_RD_CMD_MSB : XIP_CTRL_RD_CMD_LSB` ^| r/w <| Flash read command
<|`21` `XIP_CTRL_SPI_CSEN` ^| r/w <| Allow SPI chip-select to be actually asserted when set
<|`22` `XIP_CTRL_HIGHSPEED` ^| r/w <| enable SPI high-speed mode (ignoring `XIP_CTRL_PRSCx`)
<|`23:26` `XIP_CTRL_CDIV3 : XIP_CTRL_CDIV0` ^| r/- <| 4-bit clock divider for fine-tuning
<|`27:28` - ^| r/- <| _reserved_, read as zero
<|`29` `XIP_CTRL_BURST_EN` ^| r/- <| XIP burst mode enabled (if XIP cache is implemented)
<|`26:23` `XIP_CTRL_CDIV3 : XIP_CTRL_CDIV0` ^| r/- <| 4-bit clock divider for fine-tuning
<|`29:27` - ^| r/- <| _reserved_, read as zero
<|`30` `XIP_CTRL_PHY_BUSY` ^| r/- <| SPI PHY busy when set
<|`31` `XIP_CTRL_XIP_BUSY` ^| r/- <| XIP access in progress when set
| `0xffffff44` | _reserved_ |`31:0` | r/- | _reserved_, read as zero
Expand Down
Binary file modified docs/figures/neorv32_bus.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading