Skip to content

Commit

Permalink
[docs] update XBUS section
Browse files Browse the repository at this point in the history
  • Loading branch information
stnolting committed Apr 15, 2024
1 parent 61519bf commit 891aed6
Showing 1 changed file with 68 additions and 98 deletions.
166 changes: 68 additions & 98 deletions docs/datasheet/soc_xbus.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -5,63 +5,57 @@
[cols="<3,<3,<4"]
[frame="topbot",grid="none"]
|=======================
| Hardware source file(s): | neorv32_xbus.vhd | External bus gateway
| | neorv32_cache.vhd | Generic cache module
| Software driver file(s): | none | _implicitly used_
| Top entity port(s): | `xbus_adr_o` | address output (32-bit)
| | `xbus_dat_i` | data input (32-bit)
| | `xbus_dat_o` | data output (32-bit)
| | `xbus_we_o` | write enable (1-bit)
| | `xbus_sel_o` | byte enable (4-bit)
| | `xbus_stb_o` | strobe (1-bit)
| | `xbus_cyc_o` | valid cycle (1-bit)
| | `xbus_ack_i` | acknowledge (1-bit)
| | `xbus_err_i` | bus error (1-bit)
| Configuration generics: | `XBUS_EN` | enable external bus interface when `true`
| | `XBUS_TIMEOUT` | number of clock cycles after which an unacknowledged external bus access will auto-terminate (0 = disabled)
| | `XBUS_PIPE_MODE` | when `false` (default): classic/standard Wishbone protocol; when `true`: pipelined Wishbone protocol
| | `XBUS_ASYNC_RX` | use registered RX path when `false` (default); use async/direct RX path when `true`
| | `XBUS_ASYNC_TX` | use registered TX path when `false` (default); use async/direct TX path when `true`
| | `XBUS_CACHE_EN` | implement the external bus cache
| | `XBUS_CACHE_NUM_BLOCKS` | number of blocks ("lines"), has to be a power of two.
| | `XBUS_CACHE_BLOCK_SIZE` | size in bytes of each block, has to be a power of two.
| CPU interrupts: | none |
| Hardware source files: | neorv32_xbus.vhd | External bus gateway
| | neorv32_cache.vhd | Generic cache module
| Software driver files: | none | _implicitly used_
| Top entity ports: | `xbus_adr_o` | address output (32-bit)
| | `xbus_dat_i` | data input (32-bit)
| | `xbus_dat_o` | data output (32-bit)
| | `xbus_we_o` | write enable (1-bit)
| | `xbus_sel_o` | byte enable (4-bit)
| | `xbus_stb_o` | bus strobe (1-bit)
| | `xbus_cyc_o` | valid cycle (1-bit)
| | `xbus_ack_i` | acknowledge (1-bit)
| | `xbus_err_i` | bus error (1-bit)
| Configuration generics: | `XBUS_EN` | enable external bus interface when `true`
| | `XBUS_TIMEOUT` | number of clock cycles after which an unacknowledged external bus access will auto-terminate (0 = disabled)
| | `XBUS_REGSTAGE_EN` | implement XBUS register stages
| | `XBUS_CACHE_EN` | implement the external bus cache
| | `XBUS_CACHE_NUM_BLOCKS` | number of blocks ("lines"), has to be a power of two.
| | `XBUS_CACHE_BLOCK_SIZE` | size in bytes of each block, has to be a power of two.
| CPU interrupts: | none |
|=======================


The external bus interface provides a Wishbone b4-compatible on-chip bus interface that is
implemented if the `XBUS_EN` generic is `true`. This bus interface can be used to attach external memories,
custom hardware accelerators, additional peripheral devices or all other kinds of IP blocks.
**Overview**

The external bus interface provides a **Wishbone b4**-compatible on-chip bus interface that is
implemented if the `XBUS_EN` generic is `true`. This bus interface can be used to attach processor-external
modules like memories, custom hardware accelerators or additional peripheral devices.
An optional cache module ("XCACHE") can be enabled to improve memory access latency.

.Address Mapping
[IMPORTANT]
The external interface is **not** mapped to a specific address space. Instead, all CPU memory accesses that
do not target a specific (and actually implemented) processor-internal address region (hence, accessing the "void";
see section <<_address_space>>) are redirected to the external bus interface.
see section <<_address_space>>) are **redirected** to the external bus interface.


**Wishbone Bus Protocol**

The external bus interface either uses the **standard** (also called "classic") Wishbone protocol (default) or
**pipelined** Wishbone protocol. The protocol to be used is configured via the `XBUS_PIPE_MODE` generic:
The external bus interface complies to the **pipelined Wishbone b4** protocol. Even though this protocol
was explicitly designed to support pipelined transfers, only a single transfer will be "in fly" at once.
Hence, just two types of bus transactions are generated by the XBUS controller (see images below).

* If `XBUS_PIPE_MODE` is `false`, all bus control signals including `xbus_stb_o` are active and remain stable until the
transfer is acknowledged/terminated.
* If `XBUS_PIPE_MODE` is `true`, all bus control except `xbus_stb_o` are active and remain until the transfer is
acknowledged/terminated. In this case, `xbus_stb_o` is asserted only during the very first bus clock cycle.
.XBUS/Wishbone Write Transaction
image::xbus_write.png[700]

.Exemplary Wishbone bus accesses using "classic" and "pipelined" protocol
[cols="^2,^2"]
[grid="none"]
|=======================
a| image::wishbone_classic_read.png[700,300]
a| image::wishbone_pipelined_write.png[700,300]
| **Classic** Wishbone read access | **Pipelined** Wishbone write access
|=======================
.XBUS/Wishbone Read Transaction
image::xbus_read.png[700]

[WARNING]
If the Wishbone interface is configured to operate in classic/standard mode (`XBUS_PIPE_MODE` = false) a
**sync** RX path (`XBUS_ASYNC_RX` = false) is required for the inter-cycle pause. If `XBUS_ASYNC_RX` is
enabled while `XBUS_PIPE_MODE` is disabled the module will automatically disable the asynchronous RX option.
.Endianness
[NOTE]
Just like the processor itself the XBUS interface uses **little-endian** byte order.

.Wishbone Specs.
[TIP]
Expand All @@ -70,73 +64,49 @@ can be found in the data sheet "Wishbone B4 - WISHBONE System-on-Chip (SoC) Inte
Architecture for Portable IP Cores". A copy of this document can be found in the `docs` folder of this
project.

.Endianness
[NOTE]
Just like the processor itself the XBUS interface uses **little-endian** byte order.


**Bus Access**

The NEORV32 XBUS interface does not support burst transfers yet, so there is always just a single transfer "in fly".
Hence, the Wishbone `STALL` signal is not implemented. An accessed Wishbone device does not have to respond immediately to a bus
request by sending an ACK. Instead, there is a _time window_ where the device has to acknowledge the transfer. This time window
is configured by the `XBUS_TIMEOUT` generic that defines the maximum time (in clock cycles) a bus access can be pending
before it is automatically terminated with an error condition. If `XBUS_TIMEOUT` is set to zero, the timeout is disabled
and a bus access can take an arbitrary number of cycles to complete (this is not recommended!).
An accessed XBUS/Wishbone device does not have to respond immediately to a bus request by sending an `ACK`.
Instead, there is a **time window** where the device has to acknowledge the transfer. This time window
is configured by the `XBUS_TIMEOUT` generic and it defines the maximum time (in clock cycles) a bus access can
be pending before it is automatically terminated raising an bus fault exception. If `XBUS_TIMEOUT` is set to zero,
the timeout is disabled and a bus access can take an arbitrary number of cycles to complete. Note that this is not
recommended as a missing ACK will permanently stall the entire processor!

When `XBUS_TIMEOUT` is greater than zero, the Wishbone gateway starts an internal countdown whenever the CPU
accesses an address via the external memory interface. If the accessed device does not acknowledge (via `xbus_ack_i`)
or terminate (via `xbus_err_i`) the transfer within `XBUS_TIMEOUT` clock cycles, the bus access is automatically canceled
setting `xbus_cyc_o` low again and a CPU load/store/instruction fetch bus access fault exception is raised.
Furthermore, an accesses XBUS/Wishbone device can signal an error condition at any time by setting the `ERR` signal
high for one cycle. This will also terminate the current bus transaction before raising a CPU bus fault exception.


**Access Latency**

By default, the XBUS gateway introduces two additional latency cycles since processor-outgoing (`*_o`) and
processor-incoming (`*_i`) signals are fully registered. Thus, any access from the CPU to a processor-external devices
via the XBUS interface requires 2 additional clock cycles. This can ease timing closure when using large (combinatorial)
processor-external interconnection networks.

Optionally, the latency of the XBUS gateway can be reduced by removing the input and/or output register stages.
Enabling the `XBUS_ASYNC_RX` option will remove the input register stage; enabling `XBUS_ASYNC_TX` option will
remove the output register stages. Note that using those "async" options might impact timing closure.

.Output Gating
[NOTE]
All outgoing Wishbone signals use a "gating mechanism" so they only change if there is a actual XBUS transaction being in
progress. This can reduce dynamic switching activity in the external bus system and also simplifies simulation-based
inspection of the Wishbone transactions. Note that this output gating is only available if the output register buffer is not
disabled (`XBUS_ASYNC_TX` = `false`).
.Register Stage
[TIP]
An optional register stage can be added to the XBUS gateway to break up the critical path easing timing closure.
When `XBUS_REGSTAGE_EN` is _true_ all outgoing and incoming XBUS signals are registered increasing access latency
by two cycles. Furthermore, all outgoing signals (like the address) will be kept stable if there is no bus access
being initiated.


**External Bus Cache (X-CACHE)**

The XBUS interface provides an optional internal cache that can be used to buffer processor-external accesses.
The x-cache is enabled via the `XBUS_CACHE_EN` generic. The total size of the cache is split into the number of
cache lines or cache blocks (`XBUS_CACHE_NUM_BLOCKS` generic) and the line or block size in bytes
(`XBUS_CACHE_BLOCK_SIZE` generic).

.Simplified X-Cache Architecture
[source,asciiart]
---------------------------------------
Simplified cache architecture ("->" = direction of access requests):
Direct Access +----------+
/|-------------------------->| Register |------------------------->|\
| | +----------+ | |
Core ---->| | | |----> XBUS
| | +--------------+ +--------------+ +-------------+ | |
\|--->| Host Arbiter |---->| Cache Memory |<----| Bus Arbiter |--->|/
+--------------+ +--------------+ +-------------+
Direct Access +----------+
/|------------------------->| Register |------------------------>|\
| | +----------+ | |
Core --->| | | |---> XBUS
| | +--------------+ +--------------+ +-------------+ | |
\|--->| Host Arbiter |--->| Cache Memory |<---| Bus Arbiter |--->|/
+--------------+ +--------------+ +-------------+
---------------------------------------

The XBUS interface provides an optional cache module that can be used to buffer and improve processor-external accesses.
The cache uses a direct-mapped architecture that implements "write-allocate" and "write-back" strategies.

The **write-allocate** strategy will fetch the entire referenced block from main memory when encountering
a cache write-miss. The **write-back** strategy will gather all writes locally inside the cache until the according
cache block is about to be replaced. In this case, the entire modified cache block is written back to main memory.

The x-cache is enabled via the `XBUS_CACHE_EN` generic. The total size of the cache is split into the number of cache lines
or cache blocks (`XBUS_CACHE_NUM_BLOCKS` generic) and the line or block size in bytes (`XBUS_CACHE_BLOCK_SIZE` generic).

The x-cache also provides "direct accesses" that bypass the cache. For example, this can be used to access processor-external
memory-mapped IO. All accesses that target the address range from `0xF0000000` to `0xFFFFFFFF` will always bypass the cache
(see section <<_address_space>>). Furthermore, load-reservate and store conditional <<_atomic_accesses>> will also always bypass the
cache **regardless of the accessed address**.


The x-cache also provides "direct accesses" that bypass the cache. For example, this can be used to access
processor-external memory-mapped IO. All accesses that target the address range from `0xF0000000` to `0xFFFFFFFF`
will always bypass the cache (see section <<_address_space>>). Furthermore, load-reservate and store conditional
<<_atomic_accesses>> will also always bypass the cache **regardless of the accessed address**.

0 comments on commit 891aed6

Please sign in to comment.