Facilitate thread pool to eliminate overhead while creating threads frequently #58

furuame · 2018-08-12T14:57:46Z

Since every CPU implementations, such as SSE and AVX, create many threads to find the nonce every time, the overhead cannot be ignored. libtuv offers a lightweight threadpool implementation for the embedded system.

This feature is expected to be implemented after the issue #57 is achieved. When the new interface is applied in ducrl, the threadpool can be hidden in the CPU PoW Algorithm context.

The text was updated successfully, but these errors were encountered:

marktwtn · 2018-11-13T02:59:19Z

IOTA IRI has added the new commit for the --pow-threads command option.

This affects the number of threads to use for proof-of-work calculation.
The thread pool would be affected, too.

jserv · 2018-11-30T09:42:17Z

IOTA IRI has added the new commit for the --pow-threads command option.
This affects the number of threads to use for proof-of-work calculation.
The thread pool would be affected, too.

Resolved in #87

marktwtn · 2019-01-21T05:51:55Z

I reference the branch integrate-libtuv-threadpool create by @2henwei.

The current libtuv would be compiled to static library(.a) without flag fPIC, which can not be used to generate the libdcurl.so file.
We have to modify libtuv and add the flag to generate another type of the library.

marktwtn · 2019-01-23T07:25:21Z

The rough time comparison of creating/using threads from libtuv or not in dcurl:

without threadpool (unit: second)

time: 0.000371
time: 0.000216
time: 0.000256
time: 0.000223
time: 0.000246
time: 0.000232
time: 0.000228
time: 0.000229
time: 0.000222
time: 0.000221

with threadpool (unit: second)

time: 0.000411
time: 0.000015
time: 0.000033
time: 0.000016
time: 0.000028              
time: 0.000013
time: 0.000078
time: 0.000013
time: 0.000034
time: 0.000018

It is the first 10 result of running PoW 100 times.

However, it needs to do some modification in libtuv to integrate with dcurl.
I will open a pull request to libtuv.
After it is accepted, I will send the pull request to dcurl.

If it is not accepted, we might need to figure out other ways for integration.

furuame · 2019-01-23T10:38:33Z

The rough time comparison of creating/using threads from libtuv or not in dcurl:

without threadpool (unit: second)
time: 0.000371
time: 0.000216
time: 0.000256
time: 0.000223
time: 0.000246
time: 0.000232
time: 0.000228
time: 0.000229
time: 0.000222
time: 0.000221
with threadpool (unit: second)
time: 0.000411
time: 0.000015
time: 0.000033
time: 0.000016
time: 0.000028              
time: 0.000013
time: 0.000078
time: 0.000013
time: 0.000034
time: 0.000018
It is the first 10 result of running PoW 100 times.

Wow! It seems there is an obvious improvement with libtuv

However, it needs to do some modification in libtuv to integrate with dcurl.

If you refer to the compiler-related flag modification, I have sent a similar issue to them.
Samsung/libtuv#128

You can re-opened it.

I will open a pull request to libtuv.
After it is accepted, I will send the pull request to dcurl.

If it is not accepted, we might need to figure out other ways for integration.

jserv · 2019-01-23T12:59:26Z

However, it needs to do some modification in libtuv to integrate with dcurl.
I will open a pull request to libtuv.
After it is accepted, I will send the pull request to dcurl.

As @2henwei suggested, we can reopen existing libtuv issue. Meanwhile, we can defer to our fork which includes the necessary build fixes as another Git submodule.

jserv · 2019-01-23T13:24:41Z

The rough time comparison of creating/using threads from libtuv or not in dcurl:
without threadpool (unit: second)
time: 0.000371
time: 0.000216
time: 0.000256
with threadpool (unit: second)
time: 0.000411
time: 0.000015
time: 0.000033

We shall eliminate the impact of cache miss and page fault.

To reduce the overhead of creating and eliminating the threads repeatedly, we integrate the thread pool of libtuv with git submodule. The pthread-related functions and data types are replaced with the corresonding ones of libtuv. The compilation of libtuv library is written in the file mk/submodule.mk. The README.md asks the user to initialize and update the git submodule right after downloading the repository. Close DLTcollab#58.

To reduce the overhead of creating and eliminating the threads repeatedly, we integrate the thread pool of libtuv with git submodule. The pthread-related functions and data types are replaced with the corresonding ones of libtuv. The compilation of libtuv library is written in the file mk/submodule.mk. Experiment: Call clock_gettime() right before and after the functions for getting the thread. The functions are pthread_create() (without thread pool) and uv_queue_work() (with thread pool). Use test-multi-pow.py as testcase since it initializes and destroys dcurl only once and does the PoW multiple times, like what IRI does. The experiment result shows the time of getting each thread and the thread number of a PoW execution is 7. Experiment result (unit: second): Without thread pool thread0: 0.000028384 thread1: 0.000025127 thread2: 0.000024748 thread3: 0.000023925 thread4: 0.000024126 thread5: 0.000025328 thread6: 0.000052900 thread0: 0.000049344 thread1: 0.000039575 thread2: 0.000036720 thread3: 0.000036249 thread4: 0.000034606 thread5: 0.000034676 thread6: 0.000033444 With thread pool thread0: 0.000124327 thread1: 0.000002084 thread2: 0.000001052 thread3: 0.000000150 thread4: 0.000000121 thread5: 0.000000080 thread6: 0.000000090 thread0: 0.000000291 thread1: 0.000000080 thread2: 0.000000050 thread3: 0.000000050 thread4: 0.000000050 thread5: 0.000000060 thread6: 0.000000050 The first consumed time of getting the thread from thread pool is longer since it is in charge of preallocating and initalizing the threads. Close DLTcollab#58.

To reduce the overhead of creating and eliminating the threads repeatedly, we integrate the thread pool of libtuv with git submodule. The pthread-related functions and data types are replaced with the corresonding ones of libtuv. The compilation of libtuv library is written in the file mk/submodule.mk. Experiment: Call clock_gettime() right before and after the functions for getting the thread. The functions are pthread_create() (without thread pool) and uv_queue_work() (with thread pool). Use test-multi-pow.py as testcase since it initializes and destroys dcurl only once and does the PoW multiple times, like what IRI does. The experiment result shows the time of getting each thread and the thread number of a PoW execution is 7. Hardware information: architecure - x86_64 CPU - AMD Ryzen 5 2400G (4 cores/8 threads) Experiment result (unit: second): Without thread pool thread0: 0.000028384 thread1: 0.000025127 thread2: 0.000024748 thread3: 0.000023925 thread4: 0.000024126 thread5: 0.000025328 thread6: 0.000052900 thread0: 0.000049344 thread1: 0.000039575 thread2: 0.000036720 thread3: 0.000036249 thread4: 0.000034606 thread5: 0.000034676 thread6: 0.000033444 With thread pool thread0: 0.000124327 thread1: 0.000002084 thread2: 0.000001052 thread3: 0.000000150 thread4: 0.000000121 thread5: 0.000000080 thread6: 0.000000090 thread0: 0.000000291 thread1: 0.000000080 thread2: 0.000000050 thread3: 0.000000050 thread4: 0.000000050 thread5: 0.000000060 thread6: 0.000000050 The first consumed time of getting the thread from thread pool is longer since it is in charge of preallocating and initalizing the threads. Close DLTcollab#58.

furuame added the enhancement New feature or request label Aug 12, 2018

jserv changed the title ~~A threadpool for CPU implementations to avoid creating threads overhead~~ Facilitate thread pool to eliminate overhead while creating threads frequently Aug 12, 2018

jserv assigned marktwtn and furuame Sep 16, 2018

marktwtn mentioned this issue Jan 30, 2019

Integrate libtuv thread pool to eliminate thread creation overhead #93

Merged

marktwtn added the spring-201902 label Feb 11, 2019

wusyong added this to the sprint-201902 milestone Feb 11, 2019

wusyong removed the sprint-201902 label Feb 11, 2019

jserv added the feature Outstanding features we should implement label Feb 11, 2019

jserv closed this as completed in #93 Feb 12, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Facilitate thread pool to eliminate overhead while creating threads frequently #58

Facilitate thread pool to eliminate overhead while creating threads frequently #58

furuame commented Aug 12, 2018

marktwtn commented Nov 13, 2018 •

edited

Loading

jserv commented Nov 30, 2018

marktwtn commented Jan 21, 2019 •

edited

Loading

marktwtn commented Jan 23, 2019

furuame commented Jan 23, 2019

jserv commented Jan 23, 2019

jserv commented Jan 23, 2019

Facilitate thread pool to eliminate overhead while creating threads frequently #58

Facilitate thread pool to eliminate overhead while creating threads frequently #58

Comments

furuame commented Aug 12, 2018

marktwtn commented Nov 13, 2018 • edited Loading

jserv commented Nov 30, 2018

marktwtn commented Jan 21, 2019 • edited Loading

marktwtn commented Jan 23, 2019

furuame commented Jan 23, 2019

jserv commented Jan 23, 2019

jserv commented Jan 23, 2019

marktwtn commented Nov 13, 2018 •

edited

Loading

marktwtn commented Jan 21, 2019 •

edited

Loading