-
Notifications
You must be signed in to change notification settings - Fork 3
/
TODO
131 lines (101 loc) · 5.76 KB
/
TODO
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
1. Implement runtime library as a kernel module, instead of as user-space code,
so that multiple applications can use the ARTICo3 infrastructure simultaneously
Seems "relatively easy" to achieve, the only big issue appears to be
memory management/copy between user-space variables and DMA memory
(in the current version, this is achieved by memcpy(malloc(), mmap())
or memcpy(mmap(), malloc())).
IDEAS:
- Use kthreads instead of pthreads
- Use ioctl() to implement each of the functions in artico3.c
2. Fix register interfaces in HLS-based kernels to force them to use 3
ports (_i, _o, _o_ap_vld). There is a first approach implemented that
uses a3reg_init() to perform a dummy read/write transaction.
3. Check if memory barriers are needed for the DMA Proxy Linux driver. In case
they are, the correct place to put them would be before starting the DMA
transfers, i.e. as part of artico3_ioctl().
#include <asm/barrier.h>
(...)
// Memory barrier before DMA transfer to ensure memory consistency
// between CPU(s) and hardware device. The memory used is allocated
// with dma_zalloc_coherent()
wmb();
/* Start DMA transfer here */
(...)
4. Check if a different implementation of the DMA Proxy driver using cached
buffers renders better performance than the current one.
[In artico3_mmap_dma()]
Buffer allocation : dma_zalloc_coherent() -> kzalloc()
Buffer remapping : dma_mmap_coherent() -> remap_pfn_range()
[In artico3_ioctl(), before artico3_dma_transfer()]
DMA send : dma_map_single(..., DMA_TO_DEVICE)
DMA receive : dma_map_single(..., DMA_FROM_DEVICE)
[In artico3_ioctl(), after artico3_dma_transfer()]
DMA send : dma_unmap_single(..., DMA_TO_DEVICE)
DMA receive : dma_unmap_single(..., DMA_FROM_DEVICE)
NOTE: Read DMA-API-HOWTO.txt in kernel documentation to see how to
properly use these functions and which ones are needed.
* dma_mapping_error()
* dma_sync_single_for_cpu()
* dma_sync_single_for_device()
* (...)
5. Port reconfiguration mechanisms to the standard FPGA manager framework,
where reconfiguration is driven from Device Tree Overlays.
Xilinx kernel:
The current version provides direct access to a file in sysfs to
which the name of the FPGA bitstream to be loaded needs to be written.
echo "system.bin" > /sys/class/fpga_manager/fpga0/firmware
In addition, Partial Reconfiguration is not supported (I guess
that the _firmware_ file in sysfs was intended only for full FPGA
reconfigurations).
drivers/fpga/fpga-mgr.c -> firmware_store() -> info.flags = 0;
Hence, kernel hacking is required to enable it. Changes can be made
in drivers/fpga/fpga-mgr.c (common FPGA Manager framework) or in
drivers/fpga/zynq-fpga.c (for Zynq-7000, which is a device-specific
low-level driver). Current implementation relies on the latter.
[Option A] drivers/fpga/fpga-mgr.c
firmware_store()
[orig] info.flags = 0;
[mod] info.flags = mgr->flags;
[Option B] drivers/fpga/zynq-fpga.c :
zynq_fpga_ops_write_init()
[orig] info->flags
[mod] mgr->flags
zynq_fpga_ops_write_complete()
[orig] info->flags
[mod] mgr->flags
EDIT: the _firmware_ file in sysfs was added as a debug entry point,
not as the main reconfiguration trigger (see commit 6f6a4b0).
Hence, it may make sense to modify the firmware_store() function.
EDIT: Xilinx implementation of the FPGA Manager infrastructure
differs in more than just the _firmware_ entry in sysfs.
The fpga_manager struct contains three extra fields, required
for Xilinx devices:
long int flags;
char key[ENCRYPTED_KEY_LEN];
char iv[ENCRYPTED_IV_LEN];
The _flags_ field is used in the current ARTICo3 runtime
library implementation to specify whether a bitstream file
is Full or Partial. Reconfiguration is then triggered by
writing the name of the bistream file to the _firmware_ entry
of the fpga_manager structure in sysfs.
Mainline Linux kernel (kernel.org):
The _firmware_ file in sysfs is not available, and the whole
reconfiguration process is specified through Device Tree Overlays.
[1] Documentation/fpga/overview.txt
[2] Documentation/fpga/fpga-mgr.txt
[3] Documentation/fpga/fpga-region.txt
[4] Documentation/devicetree/bindings/fpga/fpga-region.txt
6. Add documentation on Linux requirements for ARTICo3 to work properly.
Does it make sense to add both requirements in host PC (Vivado and
Linux distro details) and embedded target (drivers and hacks)? The
former can be edited on a3dk.txt.
7. Remove memtest demo and replace DMA proxy driver test applications with
a similar functionality (this is required to test memcpy and DMA transfer
overheads).
8. Check possible bugs:
- Verify that a kernel exists before reserving ID in artico3_kernel_create
- Verify that there are no access errors (shared resources) in the kernels
variable (artico3.c)
- Verify the sorting algorithm for kernel ports (port10 before port2?)
- Verify the sca_mode signal in shuffler.vhd (never assigned, but used)
- Verify weird clock infrastructure when using NO_BUFFER in the shuffler