-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrate FPGA-accelerated PoW #50
Changes from 47 commits
d8f0073
2154f76
d2bf35d
6c3f646
c47bd82
14dea4b
aea24e3
01ab815
60503e9
98af4d4
79ccfe3
269700f
a370a98
48f419c
2eef531
a12c471
71a016f
1e83b29
38b346f
5674857
1da48e7
5b67f55
1d5dc44
4587adb
2433b9c
252c1f1
1086c5a
0537f5c
4f33b60
9f44c8a
31240dc
cd99478
913afa0
c7bbf07
02a9780
3b4f93a
df8eedf
f60c26a
2837ed9
43a865f
97be9ec
a0f78de
20949ae
2002c1e
19ea794
6c98c16
119e67d
fafa723
f220b24
00711c8
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3,7 +3,7 @@ | |
[![Build Status](https://travis-ci.org/DLTcollab/dcurl.svg?branch=dev)](https://travis-ci.org/DLTcollab/dcurl) | ||
![Supported IRI version](https://img.shields.io/badge/Supported%20IRI%20Version-1.5.3-brightgreen.svg) | ||
|
||
Hardware-accelerated implementation for IOTA PearlDiver, which utilizes multi-threaded SIMD and GPU. | ||
Hardware-accelerated implementation for IOTA PearlDiver, which utilizes multi-threaded SIMD, FPGA and GPU. | ||
|
||
# Introduction | ||
dcurl exploits SIMD instructions on CPU and OpenCL on GPU. Both CPU and GPU accelerations can be | ||
|
@@ -15,6 +15,7 @@ Reference Implementation (IRI). | |
* dcurl will automatically configure all the GPU divices on your platform. | ||
* Check JDK installation and set JAVA_HOME if you wish to specify. | ||
* If your platform doesn't support Intel SSE, dcurl would be compiled with naive implementation. | ||
* For the IOTA hardware accelerator, we integrate [Lampa Lab's Cyclone V FPGA PoW](https://github.com/LampaLab/iota_fpga) into dcurl. Lampa Lab supports soc_system.rbf only for DE10-nano board. You need to synthesize to get soc_system.rbf for using Arrow SoCKit board. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The reason why you adopted the FPGA implementation of Lampa Lab should be addressed as well. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do you think it is necessary to maintain our own fork for FPGA-based implementation? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, it is necessary to do it. As we know, RocketBoards.org provides Golden System Reference Design (GSRD) [0] includes Linux drivers, OS, boot loader and GHRD for Cyclone V SoC. In the future, we need to integrate the OPTEE-related solution and the Mender-related solution into own modified GSRD and rebuild the SD card image. For GHRD, we maybe provide new HDL-implemented PoW for new PoW algorithm and rebuild the RBF. [0] Arria V & Cyclone V Golden System Reference Design(GSRD) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @ajblane, got it. Let's fork the repository from Lampa Lab. |
||
|
||
# Build Instructions | ||
* dcurl allows various combinations of build configurations to fit final use scenarios. | ||
|
@@ -24,6 +25,7 @@ Reference Implementation (IRI). | |
- ``BUILD_JNI``: build a shared library for IRI. The build system would generate JNI header file | ||
downloading from [latest JAVA source](https://github.com/chenwei-tw/iri/tree/feat/new_pow_interface). | ||
- ``BUILD_COMPAT``: build extra cCurl compatible interface. | ||
- ``BUILD_FPGA_ACCEL``: build the interface interacting with the Cyclone V FPGA based accelerator. Verified on DE10-nano board and Arrow SoCKit board. | ||
* Alternatively, you can specify conditional build as following: | ||
```shell | ||
$ make BUILD_GPU=0 BUILD_JNI=1 BUILD_AVX=1 | ||
|
@@ -68,6 +70,27 @@ $ make BUILD_AVX=1 check | |
[ Verified ] | ||
``` | ||
|
||
* Test with Arrow SoCKit board with [Download](https://github.com/LampaLab/iota_fpga/releases/tag/v0.1) Linux sd-card image, root password is 123456 and you need to download dcurl into root directory. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should we update RBF? => https://github.com/LampaLab/iota_fpga/releases/tag/v0.3 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, we need. Where and how can we upload this file? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I have used the performance tool [0] provided by Lampa Lab in SoCKit board with the synthesized RBF (v0.3) and reproduced experimental data depicted to a figure, e.g. [1]. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You'd better create a new Markdown file (named after There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I need to write what content is written in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @ajblane, You can summarize the IOTA paper composed by Lampa Lab and fill into There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ditto. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ping? Any further changes reflecting the above? |
||
```shell | ||
root@lampa:~# sh init_curl_pow.sh | ||
root@lampa:~# cd dcurl | ||
root@lampa:~/dcurl# make BUILD_FPGA_ACCEL=1 check | ||
``` | ||
|
||
* Expected Results | ||
``` | ||
*** Validating build/test-trinary *** | ||
[ Verified ] | ||
*** Validating build/test-curl *** | ||
[ Verified ] | ||
*** Validating build/test-pow_c *** | ||
[ Verified ] | ||
*** Validating build/test-multi_pow_cpu *** | ||
[ Verified ] | ||
*** Validating build/test-pow_fpga_accel *** | ||
[ Verified ] | ||
``` | ||
|
||
# Tweaks | ||
* Number of threads to find nonce in CPU | ||
* ```$ export DCURL_NUM_CPU=26``` | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,232 @@ | ||
/* | ||
* Copyright (C) 2018 dcurl Developers. | ||
* Copyright (c) 2018 Ievgen Korokyi. | ||
* Use of this source code is governed by MIT license that can be | ||
* found in the LICENSE file. | ||
*/ | ||
|
||
#include "pow_fpga_accel.h" | ||
#include <fcntl.h> | ||
#include <sys/mman.h> | ||
#include <unistd.h> | ||
#include "implcontext.h" | ||
#include "trinary.h" | ||
|
||
#define HPS_TO_FPGA_BASE 0xC0000000 | ||
#define HPS_TO_FPGA_SPAN 0x0020000 | ||
#define HASH_CNT_REG_OFFSET 4 | ||
#define TICK_CNT_LOW_REG_OFFSET 5 | ||
#define TICK_CNT_HI_REG_OFFSET 6 | ||
#define MWM_MASK_REG_OFFSET 3 | ||
#define CPOW_BASE 0 | ||
|
||
/* Set FPGA configuration for device files */ | ||
#define DEV_CTRL_FPGA "/dev/cpow-ctrl" | ||
#define DEV_IDATA_FPGA "/dev/cpow-idata" | ||
#define DEV_ODATA_FPGA "/dev/cpow-odata" | ||
|
||
#define INT2STRING(I, S) \ | ||
{ \ | ||
S[0] = I & 0xff; \ | ||
S[1] = (I >> 8) & 0xff; \ | ||
S[2] = (I >> 16) & 0xff; \ | ||
S[3] = (I >> 24) & 0xff; \ | ||
} | ||
|
||
static int devmem_fd; | ||
static void *fpga_regs_map; | ||
static uint32_t *cpow_map; | ||
|
||
static bool PoWFPGAAccel(void *pow_ctx) | ||
{ | ||
PoW_FPGA_Accel_Context *ctx = (PoW_FPGA_Accel_Context *) pow_ctx; | ||
|
||
int8_t fpga_out_nonce_trits[NonceTrinarySize]; | ||
|
||
char result[4]; | ||
char buf[4]; | ||
|
||
Trytes_t *object_trytes = | ||
initTrytes(ctx->input_trytes, (transactionTrinarySize) / 3); | ||
if (!object_trytes) | ||
return false; | ||
|
||
Trits_t *object_trits = trits_from_trytes(object_trytes); | ||
if (!object_trits) | ||
return false; | ||
|
||
if (write(ctx->in_fd, (char *) object_trits->data, transactionTrinarySize) < | ||
0) | ||
return false; | ||
|
||
INT2STRING(ctx->mwm, buf); | ||
if (write(ctx->ctrl_fd, buf, sizeof(buf)) < 0) | ||
return false; | ||
if (read(ctx->ctrl_fd, result, sizeof(result)) < 0) | ||
return false; | ||
|
||
if (read(ctx->out_fd, (char *) fpga_out_nonce_trits, NonceTrinarySize) < 0) | ||
return false; | ||
|
||
Trits_t *object_nonce_trits = | ||
initTrits(fpga_out_nonce_trits, NonceTrinarySize); | ||
if (!object_nonce_trits) | ||
return false; | ||
|
||
Trytes_t *nonce_trytes = trytes_from_trits(object_nonce_trits); | ||
if (!nonce_trytes) | ||
return false; | ||
|
||
memcpy(ctx->output_trytes, ctx->input_trytes, (NonceTrinaryOffset) / 3); | ||
memcpy(ctx->output_trytes + ((NonceTrinaryOffset) / 3), nonce_trytes->data, | ||
((transactionTrinarySize) - (NonceTrinaryOffset)) / 3); | ||
|
||
freeTrobject(object_trytes); | ||
freeTrobject(object_trits); | ||
freeTrobject(object_nonce_trits); | ||
freeTrobject(nonce_trytes); | ||
|
||
return true; | ||
} | ||
|
||
static bool PoWFPGAAccel_Context_Initialize(ImplContext *impl_ctx) | ||
{ | ||
int i = 0; | ||
devmem_fd = 0; | ||
fpga_regs_map = 0; | ||
cpow_map = 0; | ||
|
||
PoW_FPGA_Accel_Context *ctx = (PoW_FPGA_Accel_Context *) malloc( | ||
sizeof(PoW_FPGA_Accel_Context) * impl_ctx->num_max_thread); | ||
if (!ctx) | ||
goto fail_to_malloc; | ||
|
||
for (i = 0; i < impl_ctx->num_max_thread; i++) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Move There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My fault. Skip the previous comment. |
||
ctx[i].ctrl_fd = open(DEV_CTRL_FPGA, O_RDWR); | ||
if (ctx[i].ctrl_fd < 0) { | ||
perror("cpow-ctrl open fail"); | ||
goto fail_to_open_ctrl; | ||
} | ||
ctx[i].in_fd = open(DEV_IDATA_FPGA, O_RDWR); | ||
if (ctx[i].in_fd < 0) { | ||
perror("cpow-idata open fail"); | ||
goto fail_to_open_idata; | ||
} | ||
ctx[i].out_fd = open(DEV_ODATA_FPGA, O_RDWR); | ||
if (ctx[i].out_fd < 0) { | ||
perror("cpow-odata open fail"); | ||
goto fail_to_open_odata; | ||
} | ||
impl_ctx->bitmap = impl_ctx->bitmap << 1 | 0x1; | ||
} | ||
impl_ctx->context = ctx; | ||
pthread_mutex_init(&impl_ctx->lock, NULL); | ||
|
||
devmem_fd = open("/dev/mem", O_RDWR | O_SYNC); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The "/dev/mem" should also be used as macro. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this device driver don't be changed by developers to access to the system's physical memory. Therefore, I prefer to not use macro. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. okay. |
||
if (devmem_fd < 0) { | ||
perror("devmem open"); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Properly use |
||
goto fail_to_open_memopen; | ||
} | ||
|
||
fpga_regs_map = | ||
(uint32_t *) mmap(NULL, HPS_TO_FPGA_SPAN, PROT_READ | PROT_WRITE, | ||
MAP_SHARED, devmem_fd, HPS_TO_FPGA_BASE); | ||
if (fpga_regs_map == MAP_FAILED) { | ||
perror("devmem mmap"); | ||
goto fail_to_open_memmap; | ||
} | ||
|
||
cpow_map = (uint32_t *) (fpga_regs_map + CPOW_BASE); | ||
|
||
return true; | ||
|
||
fail_to_open_memmap: | ||
close(devmem_fd); | ||
fail_to_open_memopen: | ||
close(ctx[i].out_fd); | ||
fail_to_open_odata: | ||
close(ctx[i].in_fd); | ||
fail_to_open_idata: | ||
close(ctx[i].ctrl_fd); | ||
fail_to_open_ctrl: | ||
fail_to_malloc: | ||
for (int j = i - 1; j > 0; j--) { | ||
close(ctx[j].in_fd); | ||
close(ctx[j].out_fd); | ||
close(ctx[j].ctrl_fd); | ||
} | ||
return false; | ||
} | ||
|
||
static void PoWFPGAAccel_Context_Destroy(ImplContext *impl_ctx) | ||
{ | ||
PoW_FPGA_Accel_Context *ctx = (PoW_FPGA_Accel_Context *) impl_ctx->context; | ||
for (int i = 0; i < impl_ctx->num_max_thread; i++) { | ||
close(ctx[i].in_fd); | ||
close(ctx[i].out_fd); | ||
close(ctx[i].ctrl_fd); | ||
} | ||
free(ctx); | ||
|
||
int result = munmap(fpga_regs_map, HPS_TO_FPGA_SPAN); | ||
if (result < 0) { | ||
perror("devmem munmap"); | ||
} | ||
|
||
close(devmem_fd); | ||
} | ||
|
||
static void *PoWFPGAAccel_getPoWContext(ImplContext *impl_ctx, | ||
int8_t *trytes, | ||
int mwm) | ||
{ | ||
pthread_mutex_lock(&impl_ctx->lock); | ||
for (int i = 0; i < impl_ctx->num_max_thread; i++) { | ||
if (impl_ctx->bitmap & (0x1 << i)) { | ||
impl_ctx->bitmap &= ~(0x1 << i); | ||
pthread_mutex_unlock(&impl_ctx->lock); | ||
PoW_FPGA_Accel_Context *ctx = | ||
impl_ctx->context + sizeof(PoW_FPGA_Accel_Context) * i; | ||
memcpy(ctx->input_trytes, trytes, (transactionTrinarySize) / 3); | ||
ctx->mwm = mwm; | ||
ctx->indexOfContext = i; | ||
return ctx; | ||
} | ||
} | ||
|
||
pthread_mutex_unlock(&impl_ctx->lock); | ||
return NULL; /* It should not happen */ | ||
} | ||
|
||
static bool PoWFPGAAccel_freePoWContext(ImplContext *impl_ctx, void *pow_ctx) | ||
{ | ||
pthread_mutex_lock(&impl_ctx->lock); | ||
impl_ctx->bitmap |= 0x1 | ||
<< ((PoW_FPGA_Accel_Context *) pow_ctx)->indexOfContext; | ||
pthread_mutex_unlock(&impl_ctx->lock); | ||
return true; | ||
} | ||
|
||
static int8_t *PoWFPGAAccel_getPoWResult(void *pow_ctx) | ||
{ | ||
int8_t *ret = | ||
(int8_t *) malloc(sizeof(int8_t) * ((transactionTrinarySize) / 3)); | ||
if (!ret) | ||
return NULL; | ||
memcpy(ret, ((PoW_FPGA_Accel_Context *) pow_ctx)->output_trytes, | ||
(transactionTrinarySize) / 3); | ||
return ret; | ||
} | ||
|
||
ImplContext PoWFPGAAccel_Context = { | ||
.context = NULL, | ||
.bitmap = 0, | ||
.num_max_thread = 1, // num_max_thread >= 1 | ||
.num_working_thread = 0, | ||
.initialize = PoWFPGAAccel_Context_Initialize, | ||
.destroy = PoWFPGAAccel_Context_Destroy, | ||
.getPoWContext = PoWFPGAAccel_getPoWContext, | ||
.freePoWContext = PoWFPGAAccel_freePoWContext, | ||
.doThePoW = PoWFPGAAccel, | ||
.getPoWResult = PoWFPGAAccel_getPoWResult, | ||
}; |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
#ifndef POW_FPGA_ACCEL_H_ | ||
#define POW_FPGA_ACCEL_H_ | ||
|
||
#include <stdint.h> | ||
#include "constants.h" | ||
|
||
typedef struct _pow_fpga_accel_context PoW_FPGA_Accel_Context; | ||
|
||
struct _pow_fpga_accel_context { | ||
/* Management of Multi-thread */ | ||
int indexOfContext; | ||
/* Arguments of PoW */ | ||
int8_t input_trytes[(transactionTrinarySize) / 3]; /* 2673 */ | ||
int8_t output_trytes[(transactionTrinarySize) / 3]; /* 2673 */ | ||
int mwm; | ||
/* Device files for the PFGA accelerator*/ | ||
int ctrl_fd; | ||
int in_fd; | ||
int out_fd; | ||
}; | ||
|
||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since file
fpga-accel.mk
is empty, you don't have to include at the moment.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you fix?