-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrate FPGA-accelerated PoW #50
Changes from 41 commits
d8f0073
2154f76
d2bf35d
6c3f646
c47bd82
14dea4b
aea24e3
01ab815
60503e9
98af4d4
79ccfe3
269700f
a370a98
48f419c
2eef531
a12c471
71a016f
1e83b29
38b346f
5674857
1da48e7
5b67f55
1d5dc44
4587adb
2433b9c
252c1f1
1086c5a
0537f5c
4f33b60
9f44c8a
31240dc
cd99478
913afa0
c7bbf07
02a9780
3b4f93a
df8eedf
f60c26a
2837ed9
43a865f
97be9ec
a0f78de
20949ae
2002c1e
19ea794
6c98c16
119e67d
fafa723
f220b24
00711c8
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -30,15 +30,23 @@ SSE_S := $(shell grep -o sse /proc/cpuinfo | head -n 1) | |
ifeq ("$(BUILD_AVX)","1") | ||
CFLAGS += -mavx -mavx2 -DENABLE_AVX | ||
else | ||
ifeq ("$(BUILD_FPGA_ACCEL)","1") | ||
CFLAGS += -DENABLE_FPGA_ACCEL | ||
else | ||
ifeq ($(SSE_S),sse) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I proposed #55 to improve SSE detection for both Linux and macOS. You should be aware of it. |
||
CFLAGS += -msse2 -DENABLE_SSE | ||
endif | ||
endif | ||
endif | ||
|
||
ifeq ("$(BUILD_GPU)","1") | ||
include mk/opencl.mk | ||
endif | ||
|
||
ifeq ("$(BUILD_FPGA_ACCEL)","1") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Since file There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you fix? |
||
include mk/fpga-accel.mk | ||
endif | ||
|
||
ifeq ("$(BUILD_JNI)","1") | ||
include mk/java.mk | ||
endif | ||
|
@@ -73,6 +81,10 @@ ifeq ("$(BUILD_COMPAT)", "1") | |
TESTS += ccurl-multi_pow | ||
endif | ||
|
||
ifeq ("$(BUILD_FPGA_ACCEL)","1") | ||
TESTS += pow_fpga_accel | ||
endif | ||
|
||
TESTS := $(addprefix $(OUT)/test-, $(TESTS)) | ||
|
||
LIBS = libdcurl.so | ||
|
@@ -113,6 +125,11 @@ OBJS += \ | |
compat-ccurl.o | ||
endif | ||
|
||
ifeq ("$(BUILD_FPGA_ACCEL)","1") | ||
OBJS += \ | ||
pow_fpga_accel.o | ||
endif | ||
|
||
OBJS := $(addprefix $(OUT)/, $(OBJS)) | ||
|
||
$(OUT)/test-%.o: tests/test-%.c | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,5 @@ | ||
# dcurl - Multi-threaded Curl implementation | ||
Hardware-accelerated implementation for IOTA PearlDiver, which utilizes multi-threaded SIMD and GPU. | ||
Hardware-accelerated implementation for IOTA PearlDiver, which utilizes multi-threaded SIMD, FPGA and GPU. | ||
|
||
# Introduction | ||
dcurl exploits SIMD instructions on CPU and OpenCL on GPU. Both CPU and GPU accelerations can be | ||
|
@@ -12,6 +12,7 @@ Reference Implementation (IRI). | |
* Check JDK installation and set JAVA_HOME if you wish to specify. | ||
* Only one GPU can be facilitated with dcurl at the moment. | ||
* If your platform doesn't support Intel SSE, dcurl would be compiled with naive implementation. | ||
* For the IOTA hardware accelerator, we integrate [Lampa Lab's Cyclone V FPGA PoW](https://github.com/LampaLab/iota_fpga) into dcurl. Lampa Lab supports soc_system.rbf only for DE10-nano board. You need to synthesize to get soc_system.rbf for using Arrow SoCKit board. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The reason why you adopted the FPGA implementation of Lampa Lab should be addressed as well. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do you think it is necessary to maintain our own fork for FPGA-based implementation? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, it is necessary to do it. As we know, RocketBoards.org provides Golden System Reference Design (GSRD) [0] includes Linux drivers, OS, boot loader and GHRD for Cyclone V SoC. In the future, we need to integrate the OPTEE-related solution and the Mender-related solution into own modified GSRD and rebuild the SD card image. For GHRD, we maybe provide new HDL-implemented PoW for new PoW algorithm and rebuild the RBF. [0] Arria V & Cyclone V Golden System Reference Design(GSRD) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @ajblane, got it. Let's fork the repository from Lampa Lab. |
||
|
||
# Build Instructions | ||
* dcurl allows various combinations of build configurations to fit final use scenarios. | ||
|
@@ -22,6 +23,7 @@ Reference Implementation (IRI). | |
from downloading from | ||
[latest JAVA source](https://github.com/chenwei-tw/iri/tree/feat/new_pow_interface). | ||
- ``BUILD_COMPAT``: build extra cCurl compatible interface. | ||
- ``BUILD_FPGA_ACCEL``: build the interface interacting with the Cyclone V FPGA based accelerator. Verified on DE10-nano board and Arrow SoCKit board. | ||
* Alternatively, you can specify conditional build as following: | ||
```shell | ||
$ make BUILD_GPU=0 BUILD_JNI=1 BUILD_AVX=1 | ||
|
@@ -66,6 +68,27 @@ $ make BUILD_AVX=1 check | |
[ Verified ] | ||
``` | ||
|
||
* Test with Arrow SoCKit board with [Download](https://github.com/LampaLab/iota_fpga/releases/tag/v0.1) Linux sd-card image, root password is 123456 and you need to download dcurl into root directory. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should we update RBF? => https://github.com/LampaLab/iota_fpga/releases/tag/v0.3 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, we need. Where and how can we upload this file? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I have used the performance tool [0] provided by Lampa Lab in SoCKit board with the synthesized RBF (v0.3) and reproduced experimental data depicted to a figure, e.g. [1]. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You'd better create a new Markdown file (named after There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I need to write what content is written in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @ajblane, You can summarize the IOTA paper composed by Lampa Lab and fill into There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ditto. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ping? Any further changes reflecting the above? |
||
```shell | ||
root@lampa:~# sh init_curl_pow.sh | ||
root@lampa:~# cd dcurl | ||
root@lampa:~/dcurl# make BUILD_FPGA_ACCEL=1 check | ||
``` | ||
|
||
* Expected Results | ||
``` | ||
*** Validating build/test-trinary *** | ||
[ Verified ] | ||
*** Validating build/test-curl *** | ||
[ Verified ] | ||
*** Validating build/test-pow_c *** | ||
[ Verified ] | ||
*** Validating build/test-multi_pow_cpu *** | ||
[ Verified ] | ||
*** Validating build/test-pow_fpga_accel *** | ||
[ Verified ] | ||
``` | ||
|
||
# Tweaks | ||
* ```dcurl_init(2, 1)``` in ```jni/iri-pearldiver-exlib.c``` | ||
* ```2``` means 2 pow tasks executed in CPU, | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,163 @@ | ||
/* | ||
* Copyright (C) 2018 dcurl Developers. | ||
* Copyright (c) 2018 Ievgen Korokyi. | ||
* Use of this source code is governed by MIT license that can be | ||
* found in the LICENSE file. | ||
*/ | ||
|
||
#include <fcntl.h> | ||
#include <sys/mman.h> | ||
#include <unistd.h> | ||
#include "trinary.h" | ||
#include "constants.h" | ||
#include "pow_fpga_accel.h" | ||
|
||
#define HPS_TO_FPGA_BASE 0xC0000000 | ||
#define HPS_TO_FPGA_SPAN 0x0020000 | ||
#define HASH_CNT_REG_OFFSET 4 | ||
#define TICK_CNT_LOW_REG_OFFSET 5 | ||
#define TICK_CNT_HI_REG_OFFSET 6 | ||
#define MWM_MASK_REG_OFFSET 3 | ||
#define CPOW_BASE 0 | ||
|
||
/* Set FPGA configuration for device files */ | ||
#define DEV_CTRL_FPGA "/dev/cpow-ctrl" | ||
#define DEV_IDATA_FPGA "/dev/cpow-idata" | ||
#define DEV_ODATA_FPGA "/dev/cpow-odata" | ||
|
||
static FILE *ctrl_fd; | ||
static FILE *in_fd; | ||
static FILE *out_fd; | ||
static int devmem_fd; | ||
static void *fpga_regs_map; | ||
static uint32_t *cpow_map; | ||
|
||
int pow_fpga_accel_init() | ||
{ | ||
ctrl_fd = 0; | ||
in_fd = 0; | ||
out_fd = 0; | ||
devmem_fd = 0; | ||
fpga_regs_map = 0; | ||
cpow_map = 0; | ||
|
||
ctrl_fd = fopen(DEV_CTRL_FPGA, "r+"); | ||
|
||
if (ctrl_fd == NULL) { | ||
perror("cpow-ctrl open fail"); | ||
goto fail_dev_open_ctrl; | ||
} | ||
|
||
in_fd = fopen(DEV_IDATA_FPGA, "wb"); | ||
|
||
if (in_fd == NULL) { | ||
perror("cpow-idata open fail"); | ||
goto fail_dev_open_idata; | ||
} | ||
|
||
out_fd = fopen(DEV_ODATA_FPGA, "rb"); | ||
|
||
if (out_fd == NULL) { | ||
perror("cpow-odata open fail"); | ||
goto fail_dev_open_odata; | ||
} | ||
|
||
devmem_fd = open("/dev/mem", O_RDWR | O_SYNC); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The "/dev/mem" should also be used as macro. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this device driver don't be changed by developers to access to the system's physical memory. Therefore, I prefer to not use macro. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. okay. |
||
|
||
if (devmem_fd < 0) { | ||
perror("devmem open"); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Properly use |
||
goto fail_dev_open_mem_open; | ||
} | ||
|
||
fpga_regs_map = | ||
(uint32_t *) mmap(NULL, HPS_TO_FPGA_SPAN, PROT_READ | PROT_WRITE, | ||
MAP_SHARED, devmem_fd, HPS_TO_FPGA_BASE); | ||
cpow_map = (uint32_t *) (fpga_regs_map + CPOW_BASE); | ||
|
||
if (fpga_regs_map == MAP_FAILED) { | ||
perror("devmem mmap"); | ||
goto fail_dev_open_mem_map; | ||
} | ||
|
||
return 1; | ||
|
||
fail_dev_open_mem_map: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Changing to |
||
close(devmem_fd); | ||
fail_dev_open_mem_open: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ditto. |
||
fclose(out_fd); | ||
fail_dev_open_odata: | ||
fclose(in_fd); | ||
fail_dev_open_idata: | ||
fclose(ctrl_fd); | ||
fail_dev_open_ctrl: | ||
return 0; | ||
} | ||
|
||
void pow_fpga_accel_destroy() | ||
{ | ||
int result; | ||
|
||
fclose(in_fd); | ||
fclose(out_fd); | ||
fclose(ctrl_fd); | ||
|
||
result = munmap(fpga_regs_map, HPS_TO_FPGA_SPAN); | ||
|
||
close(devmem_fd); | ||
|
||
if (result < 0) { | ||
perror("devmem munmap"); | ||
} | ||
} | ||
|
||
int8_t *PowFPGAAccel(int8_t *itrytes, int mwm, int index) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think the parameter, index, can just be removed if the FPGA solution can only be used to calculate one PoW at a time? |
||
{ | ||
int8_t fpga_out_nonce_trits[NonceTrinarySize]; | ||
int8_t *otrytes = (int8_t *) malloc(sizeof(int8_t) * (transactionTrinarySize) / 3); | ||
|
||
size_t itrytelen = 0; | ||
size_t itritlen = 0; | ||
|
||
int result; | ||
|
||
itrytelen = strnlen((char *) itrytes, (transactionTrinarySize) / 3); | ||
itritlen = 3 * itrytelen; | ||
|
||
Trytes_t *object_trytes = initTrytes(itrytes, itrytelen); | ||
This comment was marked as outdated.
Sorry, something went wrong. |
||
if (!object_trytes) | ||
return NULL; | ||
|
||
Trits_t *object_trits = trits_from_trytes(object_trytes); | ||
if (!object_trits) | ||
return NULL; | ||
|
||
fwrite((char *) object_trits->data, 1, itritlen, in_fd); | ||
fflush(in_fd); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Take two |
||
|
||
fwrite(&mwm, 1, 1, ctrl_fd); | ||
fread(&result, sizeof(result), 1, ctrl_fd); | ||
fflush(ctrl_fd); | ||
|
||
fread((char *) fpga_out_nonce_trits, 1, NonceTrinarySize, out_fd); | ||
|
||
Trits_t *object_nonce_trits = initTrits(fpga_out_nonce_trits, NonceTrinarySize); | ||
if (!object_nonce_trits) | ||
return NULL; | ||
|
||
Trytes_t *nonce_trytes = trytes_from_trits(object_nonce_trits); | ||
if (!nonce_trytes) | ||
return NULL; | ||
|
||
for (int i = 0; i < (transactionTrinarySize) / 3; i++) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This for loop can be simplified to two |
||
if (i < (NonceTrinaryOffset) / 3) | ||
otrytes[i] = itrytes[i]; | ||
else | ||
otrytes[i] = nonce_trytes->data[i - (NonceTrinaryOffset) / 3]; | ||
|
||
freeTrobject(object_trytes); | ||
freeTrobject(object_trits); | ||
freeTrobject(object_nonce_trits); | ||
freeTrobject(nonce_trytes); | ||
|
||
return otrytes; | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
#ifndef POW_FPGA_ACCEL_H_ | ||
#define POW_FPGA_ACCEL_H_ | ||
|
||
#include <stdint.h> | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Avoid extra blank lines. |
||
int8_t *PowFPGAAccel(int8_t *itrytes, int mwm, int index); | ||
int pow_fpga_accel_init(); | ||
void pow_fpga_accel_destroy(); | ||
|
||
#endif |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,69 @@ | ||
/* Test program for pow_fpga_accel */ | ||
#include "common.h" | ||
|
||
int main() | ||
{ | ||
char *trytes = | ||
"9999999999999999999999999999999999999999999999999999999999999999999999" | ||
"9999999999999999999999999999999999999999999999999999999999999999999999" | ||
"9999999999999999999999999999999999999999999999999999999999999999999999" | ||
"9999999999999999999999999999999999999999999999999999999999999999999999" | ||
"9999999999999999999999999999999999999999999999999999999999999999999999" | ||
"9999999999999999999999999999999999999999999999999999999999999999999999" | ||
"9999999999999999999999999999999999999999999999999999999999999999999999" | ||
"9999999999999999999999999999999999999999999999999999999999999999999999" | ||
"9999999999999999999999999999999999999999999999999999999999999999999999" | ||
"9999999999999999999999999999999999999999999999999999999999999999999999" | ||
"9999999999999999999999999999999999999999999999999999999999999999999999" | ||
"9999999999999999999999999999999999999999999999999999999999999999999999" | ||
"9999999999999999999999999999999999999999999999999999999999999999999999" | ||
"9999999999999999999999999999999999999999999999999999999999999999999999" | ||
"9999999999999999999999999999999999999999999999999999999999999999999999" | ||
"9999999999999999999999999999999999999999999999999999999999999999999999" | ||
"9999999999999999999999999999999999999999999999999999999999999999999999" | ||
"9999999999999999999999999999999999999999999999999999999999999999999999" | ||
"9999999999999999999999999999999999999999999999999999999999999999999999" | ||
"9999999999999999999999999999999999999999999999999999999999999999999999" | ||
"9999999999999999999999999999999999999999999999999999999999999999999999" | ||
"9999999999999999999999999999999999999999999999999999999999999999999999" | ||
"9999999999999999999999999999999999999999999999999999999999999999999999" | ||
"9999999999999999999999999999999999999999999999999999999999999999999999" | ||
"9999999999999999999999999999999999999999999999999999999999999999999999" | ||
"9999999999999999999999999999999999999999999999999999999999999999999999" | ||
"9999999999999999999999999999999999999999999999999999999999999999999999" | ||
"9999999999999999999999999999999999999999999999999999999999999999999999" | ||
"9999999999999999999999999999999999999999999999999999999999999999999999" | ||
"9999999999999999999999999999999999999999999999999999999999999999999999" | ||
"9999999999999999999999999999999999999999999999999999999999999999999999" | ||
"99999999999999999A9RGRKVGWMWMKOLVMDFWJUHNUNYWZTJADGGPZGXNLERLXYWJE9WQH" | ||
"WWBMCPZMVVMJUMWWBLZLNMLDCGDJ999999999999999999999999999999999999999999" | ||
"999999999999YGYQIVD99999999999999999999TXEFLKNPJRBYZPORHZU9CEMFIFVVQBU" | ||
"STDGSJCZMBTZCDTTJVUFPTCCVHHORPMGCURKTH9VGJIXUQJVHK99999999999999999999" | ||
"9999999999999999999999999999999999999999999999999999999999999999999999" | ||
"9999999999999999999999999999999999999999999999999999999999999999999999" | ||
"9999999999999999999999999999999999999999999999999999999999999999999999" | ||
"9999999999999"; | ||
|
||
int mwm = 14; | ||
|
||
/* test implementation of LampaLab's IOTA PoW FPGA with mwm = 14 */ | ||
pow_fpga_accel_init(); | ||
int8_t *ret_trytes = PowFPGAAccel((int8_t *)trytes, mwm, 0); | ||
pow_fpga_accel_destroy(); | ||
|
||
Trytes_t *trytes_t = initTrytes(ret_trytes, 2673); | ||
Trytes_t *hash_trytes = hashTrytes(trytes_t); | ||
|
||
/* Validation */ | ||
Trits_t *ret_trits = trits_from_trytes(hash_trytes); | ||
for (int i = 243 - 1; i >= 243 - mwm; i--) { | ||
assert(ret_trits->data[i] == 0); | ||
} | ||
|
||
free(ret_trytes); | ||
freeTrobject(trytes_t); | ||
freeTrobject(hash_trytes); | ||
freeTrobject(ret_trits); | ||
|
||
return 0; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move
CFLAGS
changes to Line 46 or similar area.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to use cross compiling. If this
CFLAGS
changes to Line 46 or similar area and there is SSE,make BUILD_FPGA_ACCEL=1
can not be correct that it results in only compilingENABLE_SSE
regions not compilingENABLE_FPGA_ACCEL
regions.[0] Cross compiling問題: shufps 為例
Therefore, we need to write code for the following rule, for example:
from this code
to this code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Let's create a new file
mk/crossbuild.mk
which accepts$CROSS_COMPILE
environment variable and perform necessary sanity checks.