forked from foss-xtensa/xrp
-
Notifications
You must be signed in to change notification settings - Fork 0
zhengstake/xrp
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Xtensa Remote Processing. ======================== - Introduction - Terminology - Implementations - Configuration - Operation - Code structure - Building kernel driver - Building XRP libraries - Building XRP examples Xtensa Remote Processing (XRP) is a master-slave communication interface for linux-based systems containing Xtensa processors. It allows linux userspace tasks to send messages to the firmware running on Xtensa processors. Message structure details are not defined by XRP, but it is assumed that the message is a small structure that describes requested action accompanied by a vector of buffers with data or storage for that action. Terminology. =========== XRP API uses the following terms: - device: single Xtensa DSP core. Devices are numbered sequentially from 0 to N - 1. There's an API call that opens device by its number. - namespace: an independent command handler on the DSP side identified by an UUID. Multiple namespaces may be active on the DSP side simultaneously. Namespace may be registered and deregistered dynamically. - queue: a communication stream between an application and a namespace on a device. Commands queued into one queue are processed in order and there's no ordering between commands in different queues. - buffer: a descriptor for a contiguous piece of memory. A buffer must be mapped before its memory is accessible. A buffer must be unmapped before it may be passed to a device. A buffer may be allocated in the device specific memory or in the generic system memory. A buffer in the device specific memory is the most effective way to pass bulk data to/from the Xtensa DSP, it is physically contiguous and data transfer does not require copying. Buffer in the generic system memory may be physically noncontiguous and tranferring data may require copying to/from the intermediate bounce buffer. - buffer group: a vector of buffers and associated allowed access flags. Buffers are numbered sequentially from 0 to N - 1. A buffer may be added to the end of a group. A buffer descriptor may be acquired from the buffer group by index. A buffer may be added to a buffer group with the access flags allowing reading, writing or both. These access flags limit the ways in which the buffer descriptor acquired from the buffer group may be mapped. - message: a unit of communication between the XRP API users. A message may be sent by the master to a queue synchronously or asynchronously. It is processed synchronously by the Xtensa DSP and the reply is delivered back to master. If a message was sent synchronously the sender is unblocked at that point. If a message was sent asynchronously the sender gets an event object that can be used to wait for the message processing completion. A message takes a small read-only region of memory with a message description data, a small write-only region of memory where the result description data is written and a buffer group with buffers for the bulk data being processed. Devices, queues, buffers, buffer groups and events are reference-counted. For each object type there's a couple of functions that increment and decrement object's reference counter. An object is freed when its reference counter drops to zero. XRP implementations. =================== There are two implementations of the XRP interface: native and fast simulation. Native implementation runs on a system where CPU cores running host linux code and Xtensa DSP cores are connected to shared memory. It may be a hardware or a simulated system. It consists of the following three parts: native linux implementation of the XRP interface, linux kernel driver and DSP implementation of the XRP interface. - native linux XRP implementation is a library code that runs in linux userspace processes on linux CPU cores. It communicates with devices managed by the linux kernel driver through the file-based interface exposed by the driver; - linux kernel driver consists of the generic XRP implementation and an optional hardware-specific XRP driver. Generic XRP driver does the following: - loads firmware into the DSP cores, performs initial synchronization and manages request queue; - manages dedicated physical memory of the Xtensa DSP cores; - maps and unmaps userspace buffers associated with requests making them physically contiguous. Hardware-specific XRP driver does the following: - manages Xtensa DSP cores: stalls/unstalls, resets, selects reset vector, configures shared memory/used IRQs, clocks, power, other relevant details to get cores going; - receives IRQs from DSPs and forwards them to the generic XRP driver; - communicates hardware information to the hardware-specific part of the DSP XRP library. - Xtensa DSP side XRP implementation is a library code that runs as a part of firmware on Xtensa DSP implementing DSP subset of the XRP API. It receives requests, invokes handler and sends replies back to host. ,--------. | System | | `------------------------------------------------------------. | ,-----------. ,-------.-------. | | | Host CPUS | | DSP 0 | DSP 1 | ... | | | `----------------------. | `-------------. | | | ,--------------. | | ,-----------------. | | | | | User process | ... | | | DSP payload | | | | | | `---------------. | | `-----------------' | | | | | ,--------------------------. | | | ^ | | | | | | | User application code | | | | | v | | | | | `--------------------------' | | | ,-----------------. | | | | | | ^ | | | | DSP XRP library | | | | | | v | | | | `-----------------' | | | | | ,--------------------------. | | | ^ ^ | | | | | | | Native Linux XRP library | | | | | | v |<~. | | | | `--------------------------' | | | | ,-----------. |<~| | | | `------------ | ----- ^ -------' | | | |HW-specific|<~~~| | | | | | | | | |XRP library|<~~~| | | | ,--------. | | | | | `-----------' | | | | | | Kernel | | | | `---- | ------ | -----' | | | | | `--- v ----- | -------. | v | | | | | | ,--------------------------. | | ,-----------. | | | | | | | XRP linux kernel driver |<----->| Shared | | | | | | | `--------------------------' | | | memory | | | | | | | | ^ | | `-----------' | | | | | | v | | | v | | | | | ,--------------------------. | | ,-----------. | | | | | | HW-specific XRP driver |<~~~~~~~~~~~~~~~| Host MMIO | | | | | | `--------------------------' | | `-----------' | | | | `---------------- | -----------' | ,--------------------. | | | | | | | DSP 0 MMIO |~~~| | | | `----------------->| DSP 1 MMIO |~~~| | | | | | ... |~~~' | | `----------------------------------' `--------------------' | | | `---------------------------------------------------------------------' Fast simulation runs linux-side code on a linux-based system and communicates with unmodified Xtensa DSP code running in a simulator on the same linux-based system. It consists of the following two parts: fast simulation implementation of the XRP interface and the DSP implementation of the XRP interface. - fast simulation XRP implementation is a library code that runs in linux userspace process and communicates through the shared memory with the Xtensa DSP side XRP code running in simulator; - Xtensa DSP side XRP implementation is the same both in native and in fast simulation cases. ,--------------. ,-------------------------. | User process | | XTSC simulation process | | `--------------------. | `----. | | | ,-------.-------. | | | | | DSP 0 | DSP 1 | ... | | | | | `-------------. | | ,-----------------------------. | | | ,-----------------. | | | | User application code | | | | | DSP payload | | | | `-----------------------------' | | | `-----------------' | | | | ^ | | | ^ | | | | v | | | | | v | | | ,-----------------------------. | | | ,-----------------. | | | | Fast simulation XRP library | | | | | DSP XRP library | | | | `-----------------------------' | | | `-----------------' | | | ^ | | | ^ ^ | | | | | | | | | | v |<~. | | | | | | | ,-----------. |<~| | | | | | | | |HW-specific|<~~~| | | | | | | | |XRP library|<~~~| | | | | | | | `-----------' | | | | | | | `---- | --------------' | | | v | | v | | | ,------------------------------------------------------------. | | | | Shared memory | | | | `------------------------------------------------------------' | | | | | | | | `-----------------------------------' | v | | | ,---------------------. | | | | Helper LUA script | | | | `---------------------' | | | | | | | v | | | ,---------------------. | | | | DSP 0 MMIO |~~| | | | DSP 1 MMIO |~~| | | | ... |~~' | | `---------------------' | | | `------------------------------' A mixed variant of native and fast simulation system running linux-side code in a separate simulator is possible. It uses native linux XRP library and generic XRP kernel driver, but fast simulation DSP payload. This way it is possible to simulate heterogeneous setups, like Aarch64-based host + Xtensa DSPs. ,--------------------------------. ,-------------------------. | Host system simulation process | | XTSC simulation process | | `-----. | `----. | ,-----------. | | ,-------.-------. | | | Host CPUS | | | | DSP 0 | DSP 1 | ... | | | `----------------------. | | | `-------------. | | | ,--------------. | | | | ,-----------------. | | | | | User process | ... | | | | | DSP payload | | | | | | `---------------. | | | | `-----------------' | | | | | ,--------------------------. | | | | | ^ | | | | | | | User application code | | | | | | | v | | | | | `--------------------------' | | | | | ,-----------------. | | | | | | ^ | | | | | | DSP XRP library | | | | | | v | | | | | | `-----------------' | | | | | ,--------------------------. | | | | | ^ ^ | | | | | | | Native Linux XRP library | | | | | | | | v |<~. | | | | `--------------------------' | | | | | | ,-----------. |<~| | | | `------------ | ----- ^ -------' | | | | | |HW-specific|<~~~| | | | | | | | | | | |XRP library|<~~~| | | | ,--------. | | | | | | | `-----------' | | | | | | Kernel | | | | | | `---- | --------------' | | | | | `--- v ----- | -------. | | | | | | | | | ,--------------------------. | | | | | | | | | | | XRP linux kernel driver | | | | | | | | | | | `--------------------------' | | | | | | | | | `---------------- ^ -----------' | | | | | | | `------------------ | -------------' | | | | | | v | | v | | | ,----------------------------------------------------------------. | | | | Shared memory | | | | `----------------------------------------------------------------' | | | | | | | | `--------------------------------------' | v | | | ,---------------------. | | | | Helper LUA script | | | | `---------------------' | | | | | | | v | | | ,---------------------. | | | | DSP 0 MMIO |~~| | | | DSP 1 MMIO |~~| | | | ... |~~' | | `---------------------' | | | `------------------------------' Kernel driver requirements. ========================== The linux kernel driver is tested with a number of linux kernel versions from 3.18 to 4.11 and is known to work on xtensa, 32- and 64-bit ARM and x86 linux. The driver makes the following assumptions: - DSP memory where firmware must be loaded (possibly including DRAM and IRAM) is writable from the linux host. Host may access that memory at physical addresses different from their physical addresses used by the DSP (i.e. multiple DSPs may have identical configurations, and thus IRAM and DRAM at the same addresses, but the host need to see each IRAM and DRAM at physical addresses that don't overlap in order to be able to load firmware to each DSP). - the driver provides functions to manage contiguous memory allocation for the DSP. These functions manage physical memory pool configured for the DSP. The DSP must have access to all of this physical memory, but not necessarily at the same physical address as the host; - the generic XRP driver is sufficient for shared-memory based operation with host and DSPs working in polling mode. If there are hardware-specific steps needed to enable DSPs, or DSPs or host need to work in IRQ mode then an additional hardware-specific XRP driver is needed for that system. There's an example of such additional driver in the package, called xrp_hw_simple, which may be used as a reference/template. It makes the following assumptions: - there exists reset and runstall memory-mapped registers for each DSP core, DSP doesn't need any additional setup to run (i.e. no clock setup, no power management, default reset vector); - there may exist memory-mapped register for sending IRQ to DSP core; if it exists then in addition to always available polling mode the DSP firmware may run in IRQ mode; - there may exist memory-mapped register for sending IRQ from DSP to the host; if it exists then in addition to always available polling mode the linux driver may run in IRQ mode. Configuration. ============= Number and configuration of Xtensa DSPs, details of MMIO registers that control them, location and amount of the shared memory all may change. The following pieces of the code capture the details of the configuration and need to be adjusted whenever it is changed: Native mode: - the device tree. Linux kernel need to have a device tree node for each Xtensa DSP managed by the XRP linux kernel driver. Please see xrp-kernel/cdns,xrp*.txt for the device tree binding information. - DSP firmware bits of the XRP don't have any hardcoded addresses, but the firmware image depends on the linker map. Fast simulation mode: - the device tree. Although there's no linux kernel and kernel driver involved, the configuration is maintained in the same format as in native mode. But in addition to a device node for each managed Xtensa DSP there must be a device node for all segments of memory shared with Xtensa simulators. Please see xrp-linux-sim/cdns,sim-shmem.txt for the device tree binding information. - DSP firmware bits of the XRP built for fast simulation have the base address of the communication area in the shared memory built into it. See how DSP_COMM_BASE is used. Operation. ========= DSP payload implements the main function. To start using XRP it must open the XRP device. After that it may register handlers for one or more namespaces and start processing incoming XRP messages with xrp_device_dispatch function. It may call xrp_device_poll to check for incoming messages and xrp_hw_wait_device_irq to go into waiti state when there's no more work to do. The very first message sequence that host XRP sends to the DSP is for synchronization. During synchronization process host communicates whether host and DSP use IRQ and IRQ numbers/MMIO register locations. Then host and DSP exchange IRQs in directions where IRQs are used. Namespace handlers registered by the DSP are the functions with the following prototype: enum xrp_status xrp_command_handler(void *handler_context, const void *in_data, size_t in_data_size, void *out_data, size_t out_data_size, struct xrp_buffer_group *buffer_group); It is called from the xrp_device_dispatch function whenever a command for the corresponding namespace is received. The function takes command description from the in_data buffer and associated buffer group from the buffer_group. It writes command processing results to the out_data buffer and to the relevant buffers of the buffer group. The function may not retain pointers to any buffers passed to it for later use and must release all references to buffers and buffer group it could have made before returning. Return value shall reflect whether the command was recognized and processed or not. IOW XRP_STATUS_SUCCESS should be returned in status if out_data is filled in, otherwise there should be XRP_STATUS_FAILURE. This return value is then returned from the xrp_device_dispatch. In case it was XRP_STATUS_FAILURE the host also gets XRP_STATUS_FAILURE for the command. handler_context is a pointer that was passed to the namespace handler registration function. Linux application that uses XRP starts with opening XRP devices with xrp_open_device function. XRP devices may have different capabilities and run different firmware, there's nothing in the XRP API about it. Linux application may need to open XRP device and communicate with it to find its capabilities. XRP device is identified by an integer index. After opening the device the application need to create a command queue with xrp_create_queue function. Once a queue is created an application can send commands through this queue. Commands may be sent synchronously with xrp_run_command_sync function or asynchronously with xrp_enqueue_command function. xrp_run_command_sync returns when the command execution is complete. xrp_enqueue_command returns immediately, regardless of the completion of the command that it enqueued. It may return a pointer to the event that becomes signaled when the command is complete. An application may call xrp_wait function to wait for the event. Completion of a command means completion of all preceding commands in the same command queue. A command may operate on data buffers, however it may not reference them by address as the address spaces of the application and DSP payload may be different. Instead each contiguous buffer is represented as an xrp_buffer object, and a group of buffers related to one command is represented as an xrp_buffer_group object. Individual buffers are referenced by index in the buffer group. xrp_buffer object can be created with the xrp_create_buffer function. It may be assigned an ordinary userspace memory or the device specific shared memory may be allocated for it. The difference is that the ordinary userspace memory may not be the most effective type of memory for communication with the DSP, because depending on its location and physical contiguity the driver may need to create a bounce buffer that will be presented to the DSP and copy data to/from there. Device specific shared memory is guaranteed to be directly accessible to the DSP firmware code and not require use of intermediate bounce buffers. A kernel bounce buffer preserves up to 12 bits of the original buffer's address alignment, that is the number of contiguous least significant bits of the bounce buffer address that are equal to zero is not less than that of the original buffer address or 12, whichever is smaller. If that number is less than 4 then 4 least significant bits of the original address are preserved. Here's an example table of possible bounce buffer addresses for various original buffer addresses: original bounce comment 0xxxxxxxx1 0xxxxxxxx1 (4 LS bits are preserved) 0xxxxxxxxc 0xxxxxxxxc (4 LS bits are preserved) 0xxxxxxx20 0xxxxxxx00 (at least 5 LS bits are 0) 0xxxxxx100 0xxxxxx900 (at least 8 LS bits are 0) 0xxxxxx800 0xxxxxx800 (at least 11 LS bits are 0) In order to access memory of the xrp_buffer object the buffer or part of it must be mapped with the xrp_map_buffer function. A buffer may be mapped for read only or read/write access. The host specifies allowed access when it adds the buffer to the buffer group, the DSP will not be able to use the buffer in a way not enabled by the host. When buffer mapping is no longer needed it may be unmapped with the xrp_unmap_bufer function. Each mapping of the buffer must be unmapped individually. A buffer is only allowed to cross host/DSP boundary when it doesn't have any mappings. In native mode XRP linux kernel driver manages DSP devices. When the driver gets loaded or device binding is added through sysfs the driver's probe function is called for each matching device node. It halts and resets the DSP, loads the firmware image and sets up the communication area address in shared memory. It then releases the DSP from reset and tries to synchronize with the firmware through the communication area. When successfull it sends selected IRQ parameters to the firmware and tests that chosen IRQs actually work. In case of success a new character device is created for that DSP. Linux userspace XRP library may then open that device and interact with it by ioctl and mmap system calls. In fast simulation mode XTSC simulator loads firmware images for all simulated DSP cores. Each firmware image has its communication area address preconfigured (see DSP_COMM_BASE in this document). Shared physical memory simulated by the XTSC is implemented as shared memory of the host platform. In case of linux it is backed by files in /dev/shm. Fast simulation mode XRP user library uses device tree blob linked with it to find names of the shared memory files and addresses of the DSP communication areas. It synchronizes with all DSP nodes configured in the device tree, the synchronization protocol is the same as in native mode. It is possible to use IRQ for communication from host to the DSP. A LUA script running in the XTSC polls hardcoded shared memory locations designated as software IRQ registers and initiates internal write to the real MMIO IRQ register when such location is written to. It is not possible to use IRQ for communication from the DSP to the host. Code structure. ============== +-dts/ -- device tree compiler/flat device tree library. May be | used with xrp-linux-sim when the host version of libfdt | is missing or too old. +-xrp-dsp/ -- Xtensa DSP side XRP implementation. +-xrp-example/ -- linux and DSP side example code for both native and fast | simulation modes. +-xrp-kernel/ -- linux kernel driver for the native XRP implementation. | `-test/ -- standalone tests for the driver. +-xrp-linux-native/ -- linux implementation of the XRP interface for the native | mode. +-xrp-linux-sim/ -- linux implementation of the XRP interface for the fast | simulation mode. `-xrp_api.h -- XRP API definition. Building kernel driver. ====================== Kernel driver is built using typical kernel module build sequence. Provide the following environment variables/make parameters: - ARCH: target linux architecture; - KSRC: points to the configured kernel tree; - CROSS_COMPILE: path and prefix name of the cross compiler. Makefile supports targets "modules", "clean" and all other standard kernel targets. Example building for xtensa linux: $ ARCH=xtensa CROSS_COMPILE=xtensa-dc233c-elf- \ KSRC=`pwd`/../kernel/xrp-sim-dc233c make -C xrp-kernel modules The result is kernel object xrp-kernel/xrp.ko loadable into the kernel it was built against. To build both xrp.ko and xrp_hw_simple.ko pass CONFIG_XRP_HW_SIMPLE=m to the make: $ ARCH=xtensa CROSS_COMPILE=xtensa-dc233c-elf- \ KSRC=`pwd`/../kernel/xrp-sim-dc233c \ make -C xrp-kernel CONFIG_XRP_HW_SIMPLE=m modules Building XRP libraries. ====================== Userspace and firmware code is built using standard autoconf configure + make sequence. Generated makefile supports all standard targets, e.g. 'make install' would install generated library and header file into selected directory structure. One build directory may be configured to build code either for the host or for the DSP, but not for both. This behavior is controlled by the configure switch '--enable-dsp'. Switches '--enable-native' (on by default) and '--enable-sim' (off by default) enable building of native variand of library/example or fast simulation variant respectively. - Xtensa DSP code needs an xcc-based toolchain. Provide the following environment variable to the configure or to make: - DSP_CORE: name of the Xtensa DSP core to build for. It must be the name of the build installed for the used xcc-based toolchain. Without any other options this will produce DSP XRP library xrp-dsp/libxrp-dsp.a $ ./configure --host=xtensa-elf --enable-dsp CC=xt-xcc $ make - linux side code needs a toolchain for the target linux. Without additional switches the following will produce linux XRP library for native mode, xrp-linux-native/libxrp-linux-native.a: $ ./configure $ make The following will produce linux XRP library for fast simulation mode, xrp-linux-sim/libxrp-linux-sim.a: $ ./configure --enable-sim --disable-native $ make Building XRP examples. ===================== An example application may be built in addition to the XRP library. The switch '--enable-sim' configures for building fast simulation library or example. Not specifying it or specifying '--disable-sim' configures for building native library or example. - adding switch '--enable-example' to the DSP build configuration will produce example firmware for native mode. Provide the following environment variable to the configure or make: - DSP_LSP: path to the LSP to be used. $ ./configure --host=xtensa-elf --enable-dsp --enable-example \ DSP_CORE=visionp6cnn_ao_exls \ DSP_LSP=MW-MP/P6_0/xtensa-elf/lib/sim-stacklocal \ CC=xt-xcc $ make The result is xrp-example/xrp-dsp-nat. In order to be loaded as a firmware it needs to be put to the linux file system under /lib/modules/firmware and its name need to be specified in the "firmware" parameter inside the xrp device node in the device tree. - adding switches '--enable-example' '--enable-sim' to the DSP build configuration will produce example firmware for fast simulation mode. Provide the following environment variable to the configure or make: - DSP_COMM_BASE: base address of the DSP communication area. $ ./configure --host=xtensa-elf --enable-dsp --enable-example --enable-sim \ DSP_CORE=visionp6cnn_ao_exls \ DSP_LSP=MW-MP/P6_0/xtensa-elf/lib/sim-stacklocal \ DSP_COMM_BASE=0xf0000000 \ CC=xt-xcc $ make The result is xrp-example/xrp-dsp-sim. In order to be loaded as a firmware a path to it needs to be passed to the XTSC. - adding switch '--enable-example' to the linux build configuration will produce example linux application for native mode. $ ./configure --enable-example $ make The result is xrp-example/xrp-linux-nat. - adding switches '--enable-example' '--enable-sim' to the linux build configuration will produce example linux application for fast simulation mode. $ ./configure --enable-example --enable-sim $ make The result is xrp-example/xrp-linux-sim.
About
Xtensa Remote Processing
Resources
Stars
Watchers
Forks
Packages 0
No packages published
Languages
- C 45.0%
- Makefile 23.4%
- PHP 14.4%
- Shell 9.8%
- Logos 6.6%
- M4 0.7%
- Lua 0.1%