-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Facilitate thread pool to eliminate overhead while creating threads frequently #58
Comments
IOTA IRI has added the new commit for the This affects the number of threads to use for proof-of-work calculation. |
I reference the branch The current |
The rough time comparison of creating/using threads from without threadpool (unit: second)
with threadpool (unit: second)
It is the first 10 result of running However, it needs to do some modification in If it is not accepted, we might need to figure out other ways for integration. |
Wow! It seems there is an obvious improvement with
If you refer to the compiler-related flag modification, I have sent a similar issue to them. You can re-opened it.
|
As @2henwei suggested, we can reopen existing |
We shall eliminate the impact of cache miss and page fault. |
To reduce the overhead of creating and eliminating the threads repeatedly, we integrate the thread pool of libtuv with git submodule. The pthread-related functions and data types are replaced with the corresonding ones of libtuv. The compilation of libtuv library is written in the file mk/submodule.mk. The README.md asks the user to initialize and update the git submodule right after downloading the repository. Close DLTcollab#58.
To reduce the overhead of creating and eliminating the threads repeatedly, we integrate the thread pool of libtuv with git submodule. The pthread-related functions and data types are replaced with the corresonding ones of libtuv. The compilation of libtuv library is written in the file mk/submodule.mk. Experiment: Call clock_gettime() right before and after the functions for getting the thread. The functions are pthread_create() (without thread pool) and uv_queue_work() (with thread pool). Use test-multi-pow.py as testcase since it initializes and destroys dcurl only once and does the PoW multiple times, like what IRI does. The experiment result shows the time of getting each thread and the thread number of a PoW execution is 7. Experiment result (unit: second): Without thread pool thread0: 0.000028384 thread1: 0.000025127 thread2: 0.000024748 thread3: 0.000023925 thread4: 0.000024126 thread5: 0.000025328 thread6: 0.000052900 thread0: 0.000049344 thread1: 0.000039575 thread2: 0.000036720 thread3: 0.000036249 thread4: 0.000034606 thread5: 0.000034676 thread6: 0.000033444 With thread pool thread0: 0.000124327 thread1: 0.000002084 thread2: 0.000001052 thread3: 0.000000150 thread4: 0.000000121 thread5: 0.000000080 thread6: 0.000000090 thread0: 0.000000291 thread1: 0.000000080 thread2: 0.000000050 thread3: 0.000000050 thread4: 0.000000050 thread5: 0.000000060 thread6: 0.000000050 The first consumed time of getting the thread from thread pool is longer since it is in charge of preallocating and initalizing the threads. Close DLTcollab#58.
To reduce the overhead of creating and eliminating the threads repeatedly, we integrate the thread pool of libtuv with git submodule. The pthread-related functions and data types are replaced with the corresonding ones of libtuv. The compilation of libtuv library is written in the file mk/submodule.mk. Experiment: Call clock_gettime() right before and after the functions for getting the thread. The functions are pthread_create() (without thread pool) and uv_queue_work() (with thread pool). Use test-multi-pow.py as testcase since it initializes and destroys dcurl only once and does the PoW multiple times, like what IRI does. The experiment result shows the time of getting each thread and the thread number of a PoW execution is 7. Hardware information: architecure - x86_64 CPU - AMD Ryzen 5 2400G (4 cores/8 threads) Experiment result (unit: second): Without thread pool thread0: 0.000028384 thread1: 0.000025127 thread2: 0.000024748 thread3: 0.000023925 thread4: 0.000024126 thread5: 0.000025328 thread6: 0.000052900 thread0: 0.000049344 thread1: 0.000039575 thread2: 0.000036720 thread3: 0.000036249 thread4: 0.000034606 thread5: 0.000034676 thread6: 0.000033444 With thread pool thread0: 0.000124327 thread1: 0.000002084 thread2: 0.000001052 thread3: 0.000000150 thread4: 0.000000121 thread5: 0.000000080 thread6: 0.000000090 thread0: 0.000000291 thread1: 0.000000080 thread2: 0.000000050 thread3: 0.000000050 thread4: 0.000000050 thread5: 0.000000060 thread6: 0.000000050 The first consumed time of getting the thread from thread pool is longer since it is in charge of preallocating and initalizing the threads. Close DLTcollab#58.
Since every CPU implementations, such as SSE and AVX, create many threads to find the nonce every time, the overhead cannot be ignored.
libtuv
offers a lightweight threadpool implementation for the embedded system.This feature is expected to be implemented after the issue #57 is achieved. When the new interface is applied in
ducrl
, the threadpool can be hidden in the CPU PoW Algorithm context.The text was updated successfully, but these errors were encountered: