Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ add optional XIP cache #799

Merged
merged 12 commits into from
Feb 9, 2024
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ mimpid = 0x01040312 -> Version 01.04.03.12 -> v1.4.3.12

| Date | Version | Comment | Link |
|:----:|:-------:|:--------|:----:|
| 09.02.2024 | 1.9.4.6 | :sparkles: add configurable XIP cache | [#799](https://github.com/stnolting/neorv32/pull/799) |
| 09.02.2024 | 1.9.4.5 | :bug: close further illegal compressed instruction encoding loopholes | [#797](https://github.com/stnolting/neorv32/pull/797) |
| 04.02.2024 | 1.9.4.4 | :bug: fix minor bug: CPU instruction bus privilege signal did not remain stable during the entire request | [#792](https://github.com/stnolting/neorv32/pull/792) |
| 03.02.2024 | 1.9.4.3 | :bug: fix minor bug: CPU instruction bus privilege signal was hardwired to "user-mode" | [#790](https://github.com/stnolting/neorv32/pull/790) |
Expand Down
8 changes: 6 additions & 2 deletions docs/datasheet/soc.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -249,7 +249,7 @@ The generic type "`suv(x:y)`" is an abbreviation for "`std_ulogic_vector(x downt
| `ICACHE_ASSOCIATIVITY` | natural | 1 | Associativity (number of sets). Allowed configurations: `1` = 1 set, direct mapped; `2` = 2-way set-associative.
4+^| **<<_processor_internal_data_cache_dcache>>**
| `DCACHE_EN` | boolean | false | Implement the data cache.
| `DCACHE_NUM_BLOCKS` | natural | 4 | Number of blocks ("pages" or "lines") Has to be a power of two.
| `DCACHE_NUM_BLOCKS` | natural | 4 | Number of blocks ("pages" or "lines"). Has to be a power of two.
| `DCACHE_BLOCK_SIZE` | natural | 64 | Size in bytes of each block. Has to be a power of two.
4+^| **<<_processor_external_memory_interface_wishbone>>**
| `MEM_EXT_EN` | boolean | false | Implement the external bus interface.
Expand All @@ -258,6 +258,11 @@ The generic type "`suv(x:y)`" is an abbreviation for "`std_ulogic_vector(x downt
| `MEM_EXT_BIG_ENDIAN` | boolean | false | Use BIG endian data order interface for external bus.
| `MEM_EXT_ASYNC_RX` | boolean | false | Disable input registers when true.
| `MEM_EXT_ASYNC_TX` | boolean | false | Disable output registers when true.
4+^| **<<_execute_in_place_module_xip>>**
| `XIP_EN` | boolean | false | Implement the execute in-place module.
| `XIP_CACHE_EN` | boolean | false | Implement XIP cache.
| `XIP_CACHE_NUM_BLOCKS` | natural | 8 | Number of blocks in XIP cache. Has to be a power of two.
| `XIP_CACHE_BLOCK_SIZE` | natural | 256 | Number of bytes per XIP cache block. Has to be a power of two, min 4.
4+^| **<<_external_interrupt_controller_xirq>>**
| `XIRQ_NUM_CH` | natural | 0 | Number of channels of the external interrupt controller. Valid values are 0..32.
| `XIRQ_TRIGGER_TYPE` | suv(31:0) | 0xFFFFFFFF | Trigger type (one bit per channel): `0` = level-triggered, '1' = edge triggered.
Expand Down Expand Up @@ -287,7 +292,6 @@ The generic type "`suv(x:y)`" is an abbreviation for "`std_ulogic_vector(x downt
| `IO_NEOLED_EN` | boolean | false | Implement the <<_smart_led_interface_neoled>>.
| `IO_NEOLED_TX_FIFO` | natural | 1 | TX FIFO depth of the the <<_smart_led_interface_neoled>>. Has to be a power of two, min 1, max 32768.
| `IO_GPTMR_EN` | boolean | false | Implement the <<_general_purpose_timer_gptmr>>.
| `IO_XIP_EN` | boolean | false | Implement the <<_execute_in_place_module_xip>>.
| `IO_ONEWIRE_EN` | boolean | false | Implement the <<_one_wire_serial_interface_controller_onewire>>.
| `IO_DMA_EN` | boolean | false | Implement the <<_direct_memory_access_controller_dma>>.
| `IO_SLINK_EN` | boolean | false | Implement the <<_stream_link_interface_slink>>.
Expand Down
2 changes: 1 addition & 1 deletion docs/datasheet/soc_sysinfo.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,7 @@ Bit fields in this register are set to all-zero if the according cache is not im
| `26` | `SYSINFO_SOC_IO_NEOLED` | set if the NEOLED is implemented (via top's `IO_NEOLED_EN` generic)
| `27` | `SYSINFO_SOC_IO_XIRQ` | set if the XIRQ is implemented (via top's `XIRQ_NUM_CH` generic)
| `28` | `SYSINFO_SOC_IO_GPTMR` | set if the GPTMR is implemented (via top's `IO_GPTMR_EN` generic)
| `29` | `SYSINFO_SOC_IO_XIP` | set if the XIP module is implemented (via top's `IO_XIP_EN` generic)
| `29` | `SYSINFO_SOC_XIP` | set if the XIP module is implemented (via top's `XIP_EN` generic)
| `30` | `SYSINFO_SOC_IO_ONEWIRE` | set if the ONEWIRE interface is implemented (via top's `IO_ONEWIRE_EN` generic)
| `31` | `SYSINFO_SOC_OCD` | set if on-chip debugger is implemented (via top's `ON_CHIP_DEBUGGER_EN` generic)
|=======================
Expand Down
65 changes: 31 additions & 34 deletions docs/datasheet/soc_xip.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,10 @@
| | `xip_clk_o` | 1-bit serial clock output
| | `xip_dat_i` | 1-bit serial data input
| | `xip_dat_o` | 1-bit serial data output
| Configuration generics: | `IO_XIP_EN` | implement XIP module when `true`
| Configuration generics: | `XIP_EN` | implement XIP module when `true`
| | `XIP_CACHE_EN` | implement XIP cache when `true`
| | `XIP_CACHE_NUM_BLOCKS` | number of blocks in XIP cache; has to be a power of two
| | `XIP_CACHE_BLOCK_SIZE` | number of bytes per XIP cache block; has to be a power of two, min 4
| CPU interrupts: | none |
|=======================

Expand All @@ -22,17 +25,16 @@
The execute in-place (XIP) module allows to execute code (and read constant data) directly from an external SPI flash memory.
The standard serial peripheral interface (SPI) is used as transfer protocol.

From the CPU side, the modules provides two different interfaces: one for transparently accessing the XIP flash and another
From the CPU side, the modules provides two independent interfaces: one for transparently accessing the XIP flash and another
one for accessing the module's control and status registers. The first interface provides a _transparent_
gateway to the SPI flash, so the CPU can directly fetch and execute instructions (and/or read constant _data_).
Note that this interface is read-only. Any write access will raise a bus error exception.
The second interface is mapped to the processor's IO space and allows data accesses to the XIP module's
configuration registers.
Note that this interface is read-only. Any write access will raise a bus error exception. The second interface is
mapped to the processor's IO space and allows accesses to the XIP module's configuration registers.

.XIP Address Mapping
[NOTE]
When XIP mode is enabled the flash is mapped to fixed address space region starting at address
`0xE0000000` (see section <<_address_space>>).
`0xE0000000` (see section <<_address_space>>) supporting a maximum flash size of 256MB.

.XIP Example Program
[TIP]
Expand All @@ -44,13 +46,15 @@ an external SPI flash to run a program from it.

The XIP module accesses external flash using the standard SPI protocol. The module always sends data MSB-first and
provides all of the standard four clock modes (0..3), which are configured via the `XIP_CTRL_CPOL` (clock polarity)
and `XIP_CTRL_CPHA` (clock phase) control register bits, respectively.
and `XIP_CTRL_CPHA` (clock phase) control register bits, respectively. The flash's "read command", which initiates
a read access, is defined by the `XIP_CTRL_RD_CMD` control register bits. For most SPI flash memories this is `0x03`
for _normal_ SPI mode.

The SPI clock frequency (`xip_clk_o`) is programmed by the 3-bit `XIP_CTRL_PRSCx` clock prescaler for a coarse clock
selection and a 4-bit clock divider `XPI_CTRL_CDIVx` for a fine clock configuration.
The SPI clock (`xip_clk_o`) frequency is programmed by the 3-bit `XIP_CTRL_PRSCx` clock prescaler for a coarse clock
selection and a 4-bit clock divider `XPI_CTRL_CDIVx` for a fine clock selection.
The following clock prescalers (`XIP_CTRL_PRSCx`) are available:

.XIP prescaler configuration
.XIP clock prescaler configuration
[cols="<4,^1,^1,^1,^1,^1,^1,^1,^1"]
[options="header",grid="rows"]
|=======================
Expand Down Expand Up @@ -78,16 +82,6 @@ _**f~SPI~**_ = _f~main~[Hz]_ / (2 * 1 * (1 + `XPI_CTRL_CDIVx`))
Hence, the maximum SPI clock when in high-speed mode is f~main~ / 2.


.High-Speed SPI mode
[TIP]
The module provides a "high-speed" SPI mode. In this mode the clock prescaler configuration (`XIP_CTRL_PRSCx`) is ignored
and the SPI clock operates at f~main~ / 2 (half of the processor's main clock). High speed SPI mode is enabled by setting
the control register's `XIP_CTRL_HIGHSPEED` bit.

The flash's "read command", which initiates a read access, is defined by the `XIP_CTRL_RD_CMD` control register bits.
For most SPI flash memories this is `0x03` for normal SPI mode.


**Direct SPI Access**

The XIP module allows to initiate _direct_ SPI transactions. This feature can be used to configure the attached SPI
Expand Down Expand Up @@ -167,16 +161,19 @@ It is highly recommended to enable the <<_processor_internal_instruction_cache_i
of the SPI access latency.


**XIP Burst Mode**
**XIP Cache**

Since every single instruction fetch request from the CPU is translated into serial SPI transmissions the access latency is
very high resulting in a low throughput. In order to improve performance, the XIP module provides an optional cache that
allows to buffer recently-accessed data. The cache is implemented as a simple direct-mapped read-only cache with a configurable
cache layout:

By default, every XIP access to the flash transmits the read command and the word-aligned address before reading four consecutive
data bytes. Obviously, this introduces a certain transmission overhead. To reduces this overhead, the XIP mode allows to utilize
the flash's _incrmental read_ function, which will return consecutive bytes when continuing to send clock cycles after a read command.
Hence, the XIP module provides an optional "burst mode" to accelerate consecutive read accesses.
* `XIP_CACHE_EN`: when set to `true` the CIP cache is implemented
* `XIP_CACHE_NUM_BLOCKS` defines the number of cache blocks (or lines)
* `XIP_CACHE_BLOCK_SIZE` defines the size in bytes of each cache block

The XIP burst mode is enabled by setting the `XIP_CTRL_BURST_EN` bit in the module's control register. The burst mode only affects
the actual XIP mode and _not_ the direct SPI mode. Hence, it should be enabled right before enabling XIP mode only.
By using the XIP burst mode flash read accesses can be accelerated by up to 50%.
When the cache is implemented, the XIP module operates in **burst mode** utilizing the flash's _incremental read_ capabilities.
Thus, several bytes (= `XIP_CACHE_BLOCK_SIZE`) are read consecutively from the flash using a single read command.


**Register Map**
Expand All @@ -195,12 +192,12 @@ By using the XIP burst mode flash read accesses can be accelerated by up to 50%.
<|`12:11` `XIP_CTRL_XIP_ABYTES_MSB : XIP_CTRL_XIP_ABYTES_LSB` ^| r/w <| Number of address bytes for XIP flash (minus 1)
<|`20:13` `XIP_CTRL_RD_CMD_MSB : XIP_CTRL_RD_CMD_LSB` ^| r/w <| Flash read command
<|`21` `XIP_CTRL_SPI_CSEN` ^| r/w <| Allow SPI chip-select to be actually asserted when set
<|`22` `XIP_CTRL_HIGHSPEED` ^| r/w <| enable SPI high-speed mode (ignoring _XIP_CTRL_PRSC_)
<|`23` `XIP_CTRL_BURST_EN` ^| r/w <| Enable XIP burst mode
<|`24:27` `XIP_CTRL_CDIV3 : XIP_CTRL_CDIV0` ^| r/- <| 4-bit clock divider for fine-tuning
<|`29:28` - ^| r/- <| _reserved_, read as zero
<|`30` `XIP_CTRL_PHY_BUSY` ^| r/- <| SPI PHY busy when set
<|`31` `XIP_CTRL_XIP_BUSY` ^| r/- <| XIP access in progress when set
<|`22` `XIP_CTRL_HIGHSPEED` ^| r/w <| enable SPI high-speed mode (ignoring `XIP_CTRL_PRSCx`)
<|`23:26` `XIP_CTRL_CDIV3 : XIP_CTRL_CDIV0` ^| r/- <| 4-bit clock divider for fine-tuning
<|`27:28` - ^| r/- <| _reserved_, read as zero
<|`29` `XIP_CTRL_BURST_EN` ^| r/- <| XIP burst mode enabled (if XIP cache is implemented)
<|`30` `XIP_CTRL_PHY_BUSY` ^| r/- <| SPI PHY busy when set
<|`31` `XIP_CTRL_XIP_BUSY` ^| r/- <| XIP access in progress when set
| `0xffffff44` | _reserved_ |`31:0` | r/- | _reserved_, read as zero
| `0xffffff48` | `DATA_LO` |`31:0` | r/w | Direct SPI access - data register low
| `0xffffff4C` | `DATA_HI` |`31:0` | -/w | Direct SPI access - data register high; write access triggers SPI transfer
Expand Down
Loading
Loading